Discover advancements in iOS camera capture: Depth, focus, and multitasking

Back to WWDC22

Discover advancements in iOS camera capture: Depth, focus, and multitasking

Discover how you can take advantage of advanced camera capture features in your app. We'll show you how to use the LiDAR scanner to create photo and video effects and perform accurate depth measurement. Learn how your app can use the camera for picture-in-picture or multitasking, control face-driven autofocus and autoexposure during camera capture, and more. We'll also share strategies for using multiple video outputs so that you can optimize live preview while capturing high-quality video output.

For an overview on camera capture capabilities, watch "What's new in camera capture" from WWDC21.

Resources
Related Videos

WWDC23
- Discover Continuity Camera for tvOS
- Explore 3D body pose and person segmentation in Vision
WWDC22
WWDC21
- What’s new in camera capture
♪ instrumental hip hop music ♪ Hello, and welcome to “Discover advancements in iOS camera capture”. I'm Nikolas Gelo from the Camera Software team, and I'll be presenting some exciting new camera features in iOS and iPadOS. I'll begin with how to stream depth from LiDAR Scanners using AVFoundation. Next, a look at how your app will receive improved face rendering with face-driven auto focus and auto exposure. Then, I'll take you through advanced AVCaptureSession streaming configurations. And lastly, I'll show you how your app will be able to use the camera while multitasking. I'll begin with how to stream depth from LiDAR Scanners using AVFoundation. The iPhone 12 Pro, iPhone 13 Pro, and iPad Pro are equipped with LiDAR Scanners capable of outputting dense depth maps. The LiDAR Scanner works by shooting light onto the surroundings, and then collecting the light reflected off the surfaces in the scene. The depth is estimated by measuring the time it took for the light to go from the LiDAR to the environment and reflect back to the scanner. This entire process runs millions of times every second. I'll show you the LiDAR Scanner in action using AVFoundation. Here on an iPhone 13 Pro Max, I'm running an app that uses the new LiDAR Depth Camera AVCaptureDevice. The app renders streaming depth data on top of the live camera feed. Blue is shown for objects that are close and red for objects that are further away. And using the slider, I can adjust the opacity of the depth. This app also takes photos with high resolution depth maps. When I take a photo, the same depth overlay is applied but with an even greater resolution for the still. This app has one more trick up its sleeve. When I press the torch button, the app uses the high resolution depth map with the color image to render a spotlight on the scene using RealityKit. I can tap around and point the spotlight at different objects in the scene. Look how the spotlight highlights the guitar. Or if I tap on the right spot in the corner of the wall, the spotlight forms the shape of a heart. Let's go back to that guitar. It looks so cool.
API for the LiDAR Scanner was first introduced in ARKit in iPadOS 13.4. If you haven't seen the WWDC 2020 presentation “Explore ARKit 4”, I encourage you to watch it. New in iOS 15.4, your app can access the LiDAR Scanner with AVFoundation. We have introduced a new AVCapture Device Type, the built-in LiDAR Depth Camera, which delivers video and depth. It produces high-quality, high-accuracy depth information. This new AVCaptureDevice uses the rear-facing wide-angle camera to deliver video with the LiDAR Scanner to capture depth. Both the video and depth are captured in the wide-angle camera's field of view. And like the TrueDepth AVCaptureDevice, all of its formats support depth data delivery. This new AVCaptureDevice produces high quality depth data by fusing sparse output from the LiDAR Scanner with the color image from the rear-facing wide-angle camera. The LiDAR and color inputs are processed using a machine learning model that outputs a dense depth map. Because the LiDAR Depth Camera uses the rear-facing wide-angle camera, the Telephoto and Ultra Wide cameras can be used in addition with an AVCaptureMultiCamSession. This is useful for apps that wish to use multiple cameras at the same time. The LiDAR Depth Camera exposes many formats, from video resolutions of 640 by 480 to a full 12-megapixel image at 4032 by 3024. While streaming, it can output depth maps up to 320 by 240. And for photo capture, you can receive depth maps of 768 by 576. Note, the depth resolutions are slightly different for 16 by 9 and 4 by 3 formats. This is to match the video's aspect ratio. The LiDAR Depth Camera AVCaptureDevice is available on iPhone 12 Pro, iPhone 13 Pro, and iPad Pro 5th generation. iPhone 13 Pro can deliver depth data using a combination of the rear facing cameras. The AVFoundation Capture API refers to these as “virtual devices” that consist of physical devices. On the back of the iPhone 13 Pro, there are four virtual AVCaptureDevices available to use: The new LiDAR Depth Camera uses the LiDAR Scanner with the wide-angle camera. The Dual Camera uses the Wide and Telephoto cameras. The Dual Wide Camera, which uses the Wide and Ultra Wide cameras. And the Triple Camera, that uses the Wide, Ultra Wide, and Telephoto cameras. There are differences in the type of depth these devices produce. The LiDAR Depth Camera produces “absolute depth.” The time of flight technique used allows for real-world scale to be calculated. For example, this is great for computer vision tasks like measuring. The TrueDepth, Dual, Dual Wide, and Triple Cameras produce relative, disparity-based depth. This uses less power and is great for apps that render photo effects. AVFoundation represents depth using the AVDepthData class. This class has a pixel buffer containing the depth with other properties describing it, including the depth data type, the accuracy, and whether it is filtered. It is delivered by a depth-capable AVCaptureDevice, like the new LiDAR Depth Camera. You can stream depth from an AVCaptureDepthDataOutput or receive depth attached to photos from an AVCapturePhotoOutput. Depth data is filtered by default. Filtering reduces noise and fills in missing values, or holes, in the depth map. This is great for video and photography apps, so artifacts don't appear when using the depth map to apply effects on a color image. However, computer vision apps should prefer non-filtered depth data to preserve the original values in the depth map. When filtering is disabled, the LiDAR Depth Camera excludes low confidence points. To disable depth data filtering, set the isFilteringEnabled property on your AVCaptureDepthDataOutput to false, and when you receive an AVDepthData object from your delegate callback, it will not be filtered. Since ARKit already provided access to the LiDAR Scanner, you might ask, “How does AVFoundation compare?” AVFoundation is designed for video and photography apps. With AVFoundation, you can embed depth data captured with the LiDAR Scanner into high-resolution photos. ARKit is best suited for augmented reality apps, as the name suggests. With the LiDAR Scanner, ARKit is capable of delivering features like scene geometry and object placement. AVFoundation can deliver high-resolution video that is great for recording movies and taking photos. AVFoundation's LiDAR Depth Camera can output depth up to 768 by 576. This is more than twice as big as ARKit's depth resolution of 256 by 192. ARKit uses lower resolution depth maps, so it can apply augmented reality algorithms for its features. For more “in-depth” information on how to use AVFoundation to capture depth data, watch our previous session “Capturing Depth in iPhone Photography” from WWDC 2017. We're excited to see the interesting ways you can use the LiDAR Depth Camera in your apps. Next up, I'll discuss how improvements to the auto focus and auto exposure systems help to improve the visibility of faces in the scene for your app. The auto focus and auto exposure systems analyze the scene to capture the best image. The auto focus system adjusts the lens to keep the subject in focus, and the auto exposure system balances the brightest and darkest regions of a scene to keep the subject visible. However, sometimes the automatic adjustments made do not keep your subject's face in focus. And other times, the subject's face can be difficult to see with bright backlit scenes. A common feature of DSLRs and other pro cameras is to track faces in the scene to dynamically adjust the focus and exposure to keep them visible. New in iOS 15.4, the focus and exposure systems will prioritize faces. We liked the benefits of this so much that we have enabled it by default for all apps linked on iOS 15.4 or later. I'll show you some examples. Without face-driven auto focus, the camera stays focused on the background without refocusing on the face. Watch it again. Look at how his face remains out of focus as he turns around and that the trees in the background stay sharp. With face-driven auto focus enabled, you can clearly see his face. And when he turns away, the camera changes its focus to the background.
When we compare the videos side by side, the difference is clear. On the right with face-driven auto focus enabled, you can see the finer details in his beard. With bright backlit scenes, it can be challenging to keep faces well exposed. But with the auto exposure system prioritizing faces, we can easily see him.
Comparing side by side, we can see the difference here again. Notice that by keeping his face well-exposed in the picture on the right, the trees in the background appear brighter. And the sky does too. The exposure of the whole scene is adjusted when prioritizing faces.
In iOS 15.4, there are new properties on AVCaptureDevice to control when face-driven auto focus and auto exposure are enabled. You can control whether the device will “automatically adjust” these settings and decide when it should be enabled. Before toggling the “isEnabled” properties, you must first disable the automatic adjustment. The automatic enablement of this behavior is great for photography apps. It's used by Apple's Camera app. It's also great for video conferencing apps to keep faces visible during calls. FaceTime takes advantage of this, but sometimes it's not best suited for an app to have the auto focus and auto exposure systems be driven by faces. For example, if you want your app to give the user manual control over the captured image, you might consider turning this off.
If you decide face-driven auto focus or auto exposure is not right for your app, you can opt out of this behavior. First, lock the AVCaptureDevice for configuration. Then, turn off the automatic adjustment of face-driven auto focus or auto exposure. Next, disable face-driven auto focus or auto exposure. And lastly, unlock the device for configuration.
I'll talk about how you can use advanced streaming configurations to receive audio and video data that is tailored for your app's needs. The AVFoundation Capture API allows developers to build immersive apps using the camera. The AVCaptureSession manages data flow from inputs like cameras and microphones that are connected to AVCaptureOutputs, that can deliver video, audio, photos, and more. Let's take a common camera app use case for example: Applying custom effects like filters or overlays to recorded video. An app like this would have: An AVCaptureSession with two inputs, a camera and a mic, that are connected to two outputs, one for video data and one for audio data. The video data then has the effects applied, and the processed video is sent two places, to the video preview and an AVAssetWriter for recording. The audio data is also sent to the AVAssetWriter. New in iOS 16 and iPadOS 16, apps can use multiple AVCaptureVideoDataOutputs at the same time. For each video data output, you can customize the resolution, stabilization, orientation, and pixel format. Let's go back to the example camera app. There are competing capture requirements this app is balancing. The app wants to show a live video preview of the content being captured and record high quality video for later playback. For preview, the resolution needs to be just big enough for the device's screen. And the processing needs to be fast enough for low-latency preview. But when recording, its best to capture in high resolution with quality effects applied. With the ability to add a second AVCaptureVideoDataOutput, the capture graph can be extended. Now the video data outputs can be optimized. One output can deliver smaller buffers for preview, and the other can provide full-sized 4K buffers for recording. Also, the app could render a simpler, more performant version of the effect on smaller preview buffers and reserve high quality effects for full-size buffers when recording. Now the app no longer has to compromise its preview or recorded videos.
Another reason to use separate video data outputs for preview and recording is to apply different stabilization modes. Video stabilization introduces additional latency to the video capture pipeline. For preview, latency is not desirable, as the noticeable delay makes it hard to capture content. For recording, stabilization can be applied for a better experience when watching the video later. So you can have no stabilization applied on one video data output for low-latency preview and apply stabilization to the other for later playback. There are many ways to configure the resolution of your video data output. For full-size output, first, disable automatic configuration of output buffer dimensions. Then disable delivery of preview-sized output buffers. In most cases, however, the video data output is already configured for full-size output. For preview-sized output, again, disable the automatic configuration, but instead, enable delivery of preview-sized output buffers. This is enabled by default when using the photo AVCaptureSessionPreset. To request a custom resolution, specify the width and height in the output's video settings dictionary. The aspect ratio of the width and height must match the aspect ratio of the source device's activeFormat. There are more ways to configure your video data output. To apply stabilization, set the preferred stabilization to a mode like cinematic extended, which produces videos that are great to watch. You can change the orientation to receive buffers that are portrait. And you can specify the pixel format, to receive 10-bit lossless YUV buffers.
For more information on selecting pixel formats for an AVCaptureVideoDataOutput, see Technote 3121.
In addition to using multiple video data outputs, starting in iOS 16 and iPadOS 16, apps can record with AVCaptureMovieFileOutput while receiving data from AVCaptureVideoDataOutput and AVCaptureAudioDataOutput. To determine what can be added to a session, you can check whether an output can be added to it and query the session's hardwareCost property to determine whether the system can support your configuration. By receiving video data with a movie file output, you can inspect the video while recording and analyze the scene. And receiving audio data with a movie file output, you can sample audio while recording and listen to what is being recorded. With a capture graph like this, you can offload the mechanics of recording to AVCaptureMovieFileOutput while still receiving uncompressed video and audio samples.
Implementing these advanced streaming configurations requires use of no new API. We've enabled this by allowing you to do more with existing API.
And lastly, I'll discuss how your app will be able to use the camera while the user is multitasking. On iPad, users can multitask in many ways. For example, recording Voice Memos while reading Notes in Split View or with Slide Over, write in Notes in a floating window above Safari in full screen. With Picture in Picture, you can continue video playback while adding reminders to watch more WWDC videos. And with Stage Manager new to iPadOS 16, users can open multiple apps in resizable floating windows. Starting in iOS 16, AVCaptureSessions will be able to use the camera while multitasking. We prevented camera access while multitasking before because of concerns of the quality of service the camera system can deliver while multitasking. Resource-intensive apps like games running alongside an app using the camera can induce frame drops and other latency, resulting in a poor camera feed. A user watching a video months or years later that has poor quality may not remember that they recorded it while multitasking. Providing a good camera experience is our priority. When the system detects video from the camera was recorded while multitasking, a dialog will be displayed informing the user about the potential for lower quality videos. This dialog will be presented after recording has finished with AVCaptureMovieFileOutput or AVAssetWriter. It will be shown only once by the system for all apps and will have an OK button to dismiss. There are two new properties added to AVCaptureSession to indicate when multitasking camera access is supported and enabled. Capture sessions that have this enabled will no longer be interrupted with the reason “video device not available with multiple foreground apps.” Some apps may wish to require a full screen experience to use the camera. This may be useful if you wish for your app to not compete with other foreground apps for system resources. For example, ARKit does not support using the camera while multitasking.
You should ensure your app performs well when running alongside other apps. Make your app resilient to increasing system pressure by monitoring for its notifications, and take action to reduce the impact, like lowering the frame rate. You can reduce your app's footprint on the system by requesting lower-resolution, binned, or non-HDR formats. For more information on best practices of maintaining performance, read the article “Accessing the Camera While Multitasking”.
Also, video calling and video conferencing apps can display remote participants in a system-provided Picture in Picture window. Now your app's users can seamlessly continue a video call while multitasking on iPad. AVKit introduced API in iOS 15 for apps to designate a view controller for displaying remote call participants in. The video call view controller allows you to customize the content of the window. To learn more about adoption, please see the “Adopting Picture in Picture for Video Calls” article. And this concludes advancements in iOS camera capture. I showed how you can stream depth from LiDAR Scanners using AVFoundation, how your app will receive improved face rendering, Advanced AVCaptureSession streaming configurations tailored for your app, and lastly, how your app can use the camera while multitasking. I hope your WWDC rocks. ♪ ♪

Explore Get Started

Stay Updated

Explore Platforms

Featured

Explore Technologies

Featured

Explore Community

Featured

Explore Documentation

Release Notes

Explore Downloads

Featured

Explore Support

Featured

Quick Links

Resources

Related Videos

WWDC23

WWDC22

WWDC21