Meet Object Capture for iOS

Meet Object Capture for iOS

Discover how you can offer an end-to-end Object Capture experience directly in your iOS apps to help people turn their objects into ready-to-use 3D models. Learn how you can create a fully automated Object Capture scan flow with our sample app and how you can assist people in automatically capturing the best content for their model. We'll also discuss LiDAR data and provide best practices for scanning objects.

Resources
- - HD Video
  - SD Video
Related Videos

WWDC23
- Meet Reality Composer Pro
♪ Mellow instrumental hip-hop ♪ ♪ Lei Zhou: Hi, I'm Lei from the Object Capture team. In this session, my colleague Mona and I are going to give you an introduction to Object Capture for iOS. Before we get started, let's review what is Object Capture and how it works today. Object Capture employs cutting-edge computer vision technologies to create a lifelike 3D model from a series of images taken at various angles. These images are transferred to a Mac, where Object Capture API is used to reconstruct a 3D model in a matter of minutes. Since the release of our API for Mac, we have seen many apps leveraging Object Capture to produce high-quality 3D models. We have received a lot of feedback from you. We are now taking a big step to bring the full Object Capture experience to iOS! This means that we can now do both capturing with a user-friendly interface and on-device model reconstruction, all in the palm of your hand. We also provide a sample app to demonstrate this workflow on iOS. Let's take a look at the sample app in action. We will use the sample app to create a 3D model of a beautiful vase in a breeze. First, open the sample app and point it at the object. You will see an automatic bounding box generated before you start capturing. Then circle around the object while Object Capture automatically captures the right images for you. We provide the visual guidance on regions where we need more images, along with additional feedback messages to help you capture the best quality shots. After we finish one orbit, we can flip the object to capture the bottom. Once scanning for three segments are completed, we'll proceed to the reconstruction stage, which now runs locally on your iOS device. In just a few minutes, a USDZ model will be ready for use. As part of the developer documentation, we provide the source code for this app so that you can download and try it yourself right away. You can also use it as a starting point for creating your own applications. Now that we have seen a demo of Object Capture on our new sample app, let's move on to all the new exciting features this year. We will start by introducing the improvements to Object Capture, which now uses LiDAR to support scanning more objects. Next, we will demonstrate the Guided Capture feature which simplifies data capture for your objects. Then we'll explain how to create an Object Capture flow on iOS with our Object Capture API. Finally, we will highlight several new enhancements to our model reconstruction capabilities. Let's first look at the support for more objects with LiDAR. To create a high-quality 3D model, it's important to select an object with good characteristics. Our Object Capture system performs best for objects that have sufficient texture details, but we have made further improvements this year. We now support the reconstruction of low-texture objects by leveraging the LiDAR Scanner. Let's take a look at this chair as example. It lacks texture details, making it challenging for Object Capture to create a good model of it. However, using LiDAR, we are able to reconstruct it in better quality. During capturing, we take RGB photos of this chair. However, because the seat and the back are textureless, we cannot recover a complete model of it. In addition to RGB images, our API also collects point cloud data with LiDAR, which helps to produce a comprehensive representation of the object's 3D shape with enhanced coverage and density. Finally, a complete 3D model is generated from the fused-point cloud data. Here are the models of additional low-texture objects whose qualities are improved by our LiDAR-enabled system.
Although we now support low-texture objects, some objects still present challenges. It's best to avoid objects that are reflective, transparent, or contain very thin structures. Now that we have seen what objects are supported, let's take a closer look at how we can use Guided Capture to easily scan your objects. We provide Guided Capture, which automatically captures images and LiDAR data. It also provides helpful feedback during the capturing process. Additionally, it provides guidance on whether you should flip an object or not. We automate the data-capturing experience instead of requiring you to manually choose good view angles and press a button. When you circle around an object, our system will automatically select image shots with good sharpness, clarity, and exposure, and collect LiDAR points from various view angles. To capture all the details of your object, we highly recommend taking photos from as many angles as possible. We provide a capture dial that indicates which areas of the object have adequate images. We suggest filling the capture dial completely by scanning your object from all angles. In addition to auto capture, we provide real-time feedback to assist you during the capturing process. Firstly, ensure that the environment has good lighting to obtain accurate color representation. If it's too dark, you'll receive a reminder to adjust the lighting conditions. We recommend using diffuse lights to minimize reflection or highlights on the object surface. Secondly, move around the object slowly and smoothly while holding the camera steadily to avoid blurry images. If you move too fast, automatic capture will stop and remind you to slow down. Thirdly, maintain a suitable distance between the camera and an object so that the object can fit in the camera frame properly. If the distance is too far or too close, a text reminder will appear. Finally, always keep the object in the frame. If the object goes outside the field of view, automatic capture will pause, and an arrow symbol will appear to remind you to adjust the viewing direction.
To get a complete 3D model of your object, capturing all its sides is crucial. Flipping the object can help to achieve this. But whether we should flip an object depends on its attributes. If your object is rigid, flipping is a good idea. But for deformable objects, it is better not to move them because their shapes can easily change. For texture-rich objects, flipping is recommended, but it's important to avoid flipping objects that have symmetric or repetitive textures because this can be misleading to our system. Additionally, flipping textureless objects can be challenging for Object Capture since we need sufficient textures to stitch different segments together. To help with this, we provide an API to suggest whether an object is textured enough for flipping. When you decide to flip an object, scanning it in three orientations is recommended so that we can get images of all sides. And it is best to use diffuse lights to minimize the shadow and the reflection over the object's surface. Image overlap between different scan passes is also important. Part of the object in a scan pass should be captured in previous passes. We provide visual guidance on how to flip an object properly. If an object cannot be flipped, we suggest capturing it from three different heights to get images from various view angles. For mostly textureless objects, it is recommended to put them against a textured background where they can stand out. Object Capture is available on iPhone 12 Pro, iPad Pro 2021, and the later models. To check if your device supports Object Capture, we offer an API for easy verification. Next, here is Mona to tell you about Object Capture API in more details. Mona Yousofshahi: Thanks, Lei! Now that we have seen the sample app in action, let's see how we used the Object Capture API to create this app. As you saw in the demo, Object Capture has two steps, image capture and model reconstruction. Let's look at the image capture first. The Image Capture API has two parts, a session and a SwiftUI view. The session allows you to observe and control the flow of a state machine during the image capture. The SwiftUI view displays the camera feed, and automatically adapts the UI element it presents based on the state of the session. Our SwiftUI comes without any 2D text or buttons. This allows you to customize the look of your app and to more easily incorporate Object Capture into your existing app. Let's look in detail at the Object Capture state machine. A session starts in the initializing state when it is created. You will then make function calls to advance the state. The session will transition through ready, detecting, capturing, and finishing. Once in the finishing state, the session automatically moves to the completed state. At this point, you can safely tear it down and continue to the on-device reconstruction. Let's see how to use this in practice. We begin by importing RealityKit and SwiftUI. We then create an instance of ObjectCaptureSession. Since it is a reference type, we recommend storing the session inside a ground truth data model as persistent state until it is completed. With this, the session starts in the initializing state. We continue by calling the start() function with a directory telling the session where to store the captured images. We can also provide a checkpoint directory in the configuration, which can be used later to speed up the reconstruction process. After this call, the session will move to the ready state. Let's see how to use the session with the ObjectCaptureView. We use an ObjectCaptureView like any other SwiftUI view. We place it inside the body of another view, passing it the ground truth session we just created. ObjectCaptureView always displays a UI corresponding to the current state in the session. As you see here in the ready state, the view shows the camera feed with a reticle to guide you to select the object to capture. To advance the state, your app needs to provide a UI that tells the session to start detecting the object. Here, we stack a Continue button on top of the ObjectCaptureView. When pressed, it calls the startDetecting() function to move to the bounding box detection state. In the detecting state, the view automatically changes to show a bounding box around the currently detected object. You can manually adjust the size and orientation of the bounding box if needed. Our sample app also provides a reset button to allow restarting the object selection process in case you want to select a different object. This will take the session back to the ready state. Similar to the transition from ready to detecting, you need to provide a button to tell the session to start capturing once you are happy with the object selection. Here we use a Start Capture button that calls the startCapturing() function. After this call, the session moves to the capturing state. In the capturing state, the session automatically takes images while you slowly move around the object. The view displays a point cloud and a capture dial that shows where we have collected enough images of the object and where you still need to capture more. The scan pass is completed when the capture dial is entirely filled. When the capture dial completes, the session sets the userCompletedScanPass property to true. At this point, our sample app gives you the option to either finish the session or to continue to capture more images. We assign a button for each option. For the best model reconstruction, we recommend completing three scan passes. Let's see how to move to the next scan pass. You can start a new scan pass in two ways, depending on whether you decide to flip the object or not. Flipping allows you to capture those sides of the object that are not visible in the current pass, for example the bottom of the object. For this, we call beginNewScanPassAfterFlip(), which moves the state back to ready. You will then have to perform box selection in the new orientation. If you decide not to flip the object, you can instead capture more images at a different height. For this, we call beginNewScanPass(). This resets the capture dial, but the session remains in the capturing state since the bounding box has not changed. When all passes are completed, the sample app provides a Finish button. This button calls the finish() function, which tells the session that we are done capturing images, and it can start the finishing process. While in the finishing state, the session waits for all data to be saved. Once finished, the session will automatically move to the completed state. We can safely tear it down and start the on-device reconstruction. In the event of an unrecoverable error, such as if the image directory suddenly becomes unavailable, the session will transition into a failed state. If this happens, a new session will need to be created. And one last thing, you can also display the point cloud to preview which part of the object has been scanned since it was initially placed or last flipped. You do this by swapping in ObjectCapturePointCloudView for the ObjectCaptureView. This will pause the capture session and let you interact with the point cloud to preview it from all angles. Here, we show this view in combination with some text and buttons, but you can also display the point cloud full screen. Now that we have captured images of our object, let's see how to create a 3D model of it. Starting this year, you can run the reconstruction API on iOS. This allows you to run the image capture and reconstruction on the same device. Here is a recap of how to use the asynchronous reconstruction API. It works the same on iOS as on macOS. We start by attaching a task modifier to a view that we created. In the task modifier, we create a photogrammetry session and point it to the images. Optionally, we can provide the same checkpoint directory that we used during the image capture to speed up the reconstruction process. We then call the process() function to request a modelFile. Finally, we await the message stream in a loop and handle output messages as they arrive. Make sure to check out our previous talk for more details. To optimize for generating and viewing models on mobile devices, we support only the reduced detail level on iOS. The reconstructed model includes the diffuse, ambient occlusion, and normal texture maps, all designed for mobile display. If you would like to generate models at other detail levels, you can transfer your images to a Mac for reconstruction. This year, Mac reconstruction also utilizes the LiDAR data we save in our images. The new Object Capture session supports this flow as well. Let's see how. By default, the Object Capture session will stop capturing images when the reconstruction limit for the iOS device is reached. For macOS reconstruction, you can allow the session to take more images than on-device reconstruction can use. To do so, you set isOverCaptureEnabled to true in the session's configurations. These additional shots will not be used for on-device reconstruction, but they are stored in the Images folder. To reconstruct images on a Mac, you don't even need to write any code. Object Capture is integrated into our new macOS app called Reality Composer Pro. You simply import your images into the app, choose a detail level, and get your model. To learn more about this app, make sure you watch the session for Reality Composer Pro. Now that we have seen how to create 3D models with the new iOS API, let's briefly go over a few reconstruction enhancements which were highly requested. We have improved the model quality and reconstruction speed on Mac. In addition to progress percent, we now provide an estimated reconstruction time. Let's look at two other additions in more details, pose output and custom detail level. You can now request a high-quality pose for each image. Each pose includes the estimated position and orientation of the camera for that image based on our computer vision algorithms. To get the poses, add a poses request to the process() function call. Then handle the poses output when it arrives in the output messages stream. The poses will be returned early in the reconstruction process before the model is generated. This year, we have also added a new custom detail level for macOS, which gives you full control over the reconstructed model. Previously, you could choose from reduced, medium, full, and raw detail levels. With the custom detail level, you can control the amount of mesh decimation, the texture map resolution, format, and which texture maps should be included. And that's a wrap on Object Capture for iOS. Object Capture now allows you to scan more objects with LiDAR support. We showed you how you can fully capture, reconstruct, and view your model entirely on your iOS device. Object Capture for iOS enables new workflows for a variety of applications, such as e-commerce, design, education, and games. We are excited to see how you will incorporate Object Capture into your apps. Thanks for watching! ♪

10:03 - Instantiating ObjectCaptureSession

import RealityKit
import SwiftUI 

var session = ObjectCaptureSession()

10:25 - Starting the session

var configuration = ObjectCaptureSession.Configuration()
configuration.checkpointDirectory = getDocumentsDir().appendingPathComponent("Snapshots/")

session.start(imagesDirectory: getDocumentsDir().appendingPathComponent("Images/"),
              configuration: configuration)

10:50 - Creating ObjectCaptureView

import RealityKit
import SwiftUI

struct CapturePrimaryView: View {
    var body: some View {
        ZStack {
            ObjectCaptureView(session: session)
        }
    }
}

11:20 - Transition to detecting state

var body: some View {
    ZStack {
        ObjectCaptureView(session: session)
        if case .ready = session.state {
            CreateButton(label: "Continue") { 
                session.startDetecting() 
            }
        }
    }
}

11:36 - Showing ObjectCaptureView

var body: some View {
    ZStack {
        ObjectCaptureView(session: session)
    }
}

12:04 - Transition to capturing state

var body: some View {
    ZStack {
        ObjectCaptureView(session: session)
        if case .ready = session.state {
            CreateButton(label: "Continue") { 
                session.startDetecting()
            }
        } else if case .detecting = session.state {
            CreateButton(label: "Start Capture") { 
                session.startCapturing()
            }
        }
    }
}

12:27 - Showing ObjectCaptureView

var body: some View {
    ZStack {
        ObjectCaptureView(session: session)
    }
}

12:50 - Completed scan pass

var body: some View {
    if session.userCompletedScanPass {
        VStack {
        }
    } else {
        ZStack {
            ObjectCaptureView(session: session)
        }
    }
}

14:03 - Transition to finishing state

var body: some View {
    if session.userCompletedScanPass {
        VStack {
            CreateButton(label: "Finish") {
                session.finish() 
            }
        }
   } else {
        ZStack {
            ObjectCaptureView(session: session)
        }
    }
}

15:00 - Point cloud view

var body: some View {
    if session.userCompletedScanPass {
        VStack {
            ObjectCapturePointCloudView(session: session)
            CreateButton(label: "Finish") {
                session.finish() 
            }
        }
    } else {    
        ZStack {
            ObjectCaptureView(session: session)
        }
    }
}

15:50 - Reconstruction API

var body: some View {
    ReconstructionProgressView()
        .task {
            var configuration = PhotogrammetrySession.Configuration()
            configuration.checkpointDirectory = getDocumentsDir()
                .appendingPathComponent("Snapshots/")
            let session = try PhotogrammetrySession(
                input: getDocumentsDir().appendingPathComponent("Images/"),
                configuration: configuration)
            try session.process(requests: [ 
                .modelFile(url: getDocumentsDir().appendingPathComponent("model.usdz")) 
            ])
            for try await output in session.outputs {
                switch output {
                    case .processingComplete:
                        handleComplete()
                        // Handle other Output messages here.
                }}}}

17:02 - Capturing for Mac

// Capturing for Mac

var configuration = ObjectCaptureSession.Configuration()
configuration.isOverCaptureEnabled = true

session.start(imagesDirectory: getDocumentsDir().appendingPathComponent("Images/"),
              configuration: configuration)

18:40 - Pose output

// Pose output

try session.process(requests: [ 
    .poses 
    .modelFile(url: modelURL),
])
for try await output in session.outputs {
    switch output {
    case .poses(let poses):
        handlePoses(poses)
    case .processingComplete:
        handleComplete()
    }
}

Explore Get Started

Stay Updated

Explore Platforms

Featured

Explore Technologies

Featured

Explore Community

Featured

Explore Documentation

Release Notes

Explore Downloads

Featured

Explore Support

Featured

Quick Links

Resources

Related Videos

WWDC23