Explore the Action & Vision app

More Videos

Explore the Action & Vision app

It's now easy to create an app for fitness or sports coaching that takes advantage of machine learning — and to prove it, we built our own. Learn how we designed the Action & Vision app using Object Detection and Action Classification in Create ML along with the new Body Pose Estimation, Trajectory Detection, and Contour Detection features in the Vision framework. Explore how you can create an immersive application for gameplay or training from setup to analysis and feedback. And follow along in Xcode with a full sample project.

To get the most out of this session, you should have familiarity with the Vision framework and Create ML's Action Classifier tools. To learn more, we recommend watching “Build an Action Classifier with Create ML,” “Explore Computer Vision APIs,” and “Detect Body and Hand Pose with Vision.” We also recommend exploring the Action & Vision sample project to learn more about adopting these technologies.

Whether you are building a fitness coaching app, or exploring new ways of interacting, consider the incredible features that you can build by combining machine learning with the rich set of computer vision features. By bringing Create ML, Core ML, and Vision API together, there's almost no end to the magic you can bring to your app.

Resources
Related Videos

WWDC20
Hello and welcome to WWDC.
Hi. My name is Frank Doepke, and together with my colleague, Brent Dimick, we're going to explore the Action and Vision Application.
The theme that we would like to set for today is that we use the phone as an observer and give feedback to our users. What do I mean by that? We already see that at plenty of sporting events, we have people using their phones to record and film it. But our thought is, can we use our iPhone or iPad and step in as a coach? We already have it with us, even when we go to the gym, but we want to use the camera and the sensors to actually, instead of us looking and observing what's on the device, have it observe what we are doing and give us some real-time feedback. You already have the perfect tool in your pocket. We have a high-quality camera, a fast CPU, GPU and neural engine. Now, we have a set of comprehensive and cooperate APIs that make it easy for you to take advantage of all the hardware.
Last, but not least, all of this can happen on device. Now, that is important for two reasons.
First, we actually want to make sure that we preserve the privacy of our users by keeping all the data on the device. And second, you actually don't have to wait for any kind of latency by analyzing something in the cloud.
So today, in sports and fitness, we can actually see that analysis of the sport can really help everyone to improve. So we have billions of sports enthusiasts up to the professional athlete who can benefit from sports analysis. But instead of just looking at the device as we do today for many things and looking at videos of how to do it, we want to take it to the next level. We want to improve in the sports and fitness. So we do some common analysis, and we think this can be done. So, first, what did our body do? How did we move? Then we might want to look at which objects are in motion. Think of the ball in the soccer game or in a tennis game. We need to understand the field of play. The court in the tennis court, or the soccer goal. And then we need to give feedback to the user of what we actually saw and what happened.
So, for our session, we picked an example of a sport that is simple and easy to understand. And then we wrote a fun little application around it that you can actually download the source code for to actually follow along. So let me introduce the Action and Vision Application.
We picked the game of bean bag toss. It is a fun game for everyone to play. It's very simple. We have two boards set up 25 feet apart. Each of the boards has a regulation size of two feet by four feet. and it has a six-inch-diameter hole right in the center. Players now take the bean bags and throw them on the boards. And they take turns and score different scores for if you land on the board, and an extra score when you actually land in the hole. Now, you might think of bean bag as just a pastime, but still, everybody's competitive and wants to win.
So you might ask yourself the question of, like, "Why did I miss my shot?" To answer that, we want to see, how was the bag flying? What was my body pose when I released the bag? And how fast was my throw? Of course, I need to keep score, and perhaps I want to show off in front of my friends by doing some different shots. So, let's head outside and actually play a game.
All right. Here we have our phone already set up. We now go into the live-action mode to actually record the session.
The first thing we need to do is find our board. So we panned our camera over. Now it's stable. And we are waiting for the player. And there we go. We have our player. Let's see how I did.
You can see in the orange line it saw how you are actually throwing, and I could see my trajectory of my bean bag flying.
Wow, I got lucky on that shot.
The other part I would like you to pay attention to is that we have a skeleton on top of the player, so we see all the key points of our movements.
With that, we can understand the release angle when we're throwing, and we also can see what kind of throw we did-- overhand, underhand.
And even there, you see, a trick shot, where we're throwing under the leg.
You also see the speed of which the bean bag was traveling, and you have the score on the bottom left and the kind of throws that we did on the bottom right. Now, once we are done with all of the eight throws, we actually get a summary view. I can see all my different trajectories and can see what worked best for me. I see my average speed at which I was throwing, as well as the release angle. And, of course, the final score. So that is the Action and Vision App. We hope that you will enjoy it. Now let's look how we actually created this application. We have some key algorithms that have to play together to make all of this happen. We start with the prerequisite phase. As you saw, the camera was on a tripod, or somehow we need to have it stabilized. Then we have a game setup. This is kind of where we understand the playing field. And then, of course, comes the game play. Now, the prerequisite is first, is you find the board. That was the very first panning move that you saw. Once we have the board, we now need to ensure that we have scene stability. That means we know that the camera is, for instance, on a tripod or somehow otherwise stabilized. Now we're getting ready to play in the game setup part. We measure out the boards, then we find our player, and we are ready to roll. Now, when the game play starts, we can actually find all the throws. Then we analyze the kind of throw type. Is it an overhand, underhand or under leg? And last but not least, we measure the speed. Now, of course we're interested-- Which algorithms do we use? So, to find the boards, we trained a custom model, and we're using the VNCoreMLRequest to actually run the inference on that model. And it tells us where the board is. Once we have the board, we can now use VNTranslationalImageRequest to analyze for scene stability. We measure the boards by actually running the VNDetectContourRequest, which gives us the outline of the board itself. And then we use the VNDetectHumanBodyPoseRequest, which is new this year, to find the human. Now, when we are ready to play, we use the VNDetectTrajectoriesRequest, which finds the throw of the bag. And then we have a new model that we trained in Create ML and run through CoreML to actually classify what kind of throw we have done. Last but not least, we can use the measurements of the board, together with the analysis of the trajectory, to measure the speed of our throw. Now, to guide you through all of this, we actually have an icon that will help you. So you see that we have our prerequisite stage, the game setup stage and the game play. Let's dive into the details. The first part is that we need to detect the boards and recognize them. So we created a Custom Object Detection Model. We used Create ML with its object detection template. We have our own training data that we brought along, where we found images with the boards in them and negatives where there's actually no board in it. And we trained the model. You will hear later on in the session a little bit more about how we did the training. Now, once we have the model, we can run the inference through Vision. We saw that we need to fixate the camera. Why do we have to do this? Some of our algorithms actually require a stable scene. But it also gives us some other advantages because we only need to analyze the playing field once. We know after that, it doesn't change. Another neat part about this is that it shows that the user has clear intent of actually capturing it. So we don't need a start button. You don't need to touch the screen to do anything. We're doing the scene stability through registration. For that, we're using the VNTranslationalImageRegistrationRequest. That is a mouthful. What it does, it analyzes the movement from one frame to the next. And what you saw when the camera was panning is that we had a movement of ten pixels between each of the frames. Once the camera came to rest, that movement went down to zero. So we are now below a certain threshold, and we know that our scene is stable. The camera is not moving anymore. Next, we do the contour detection. For that, we use the VNDetectContourRequest. To do that, we use the bounding box that we got from our object detection as the region of interest. Then we simplify the contours for the analysis. By using these two techniques together, we only look at the contours that we have from the board, and not of the whole scene. If you want to learn more about the contour detection, you can look at our "Explore Computer Vision APIs" session. What we need next is our player. For that, we use the VNDetectHumanBodyPoseRequest. It gives us the points of the body joints, like your elbows, the shoulders and the wrist and the legs. We can use these points to analyze the angle between the joints, so we can know, for instance, our arm was bent. More details on how to use the BodyPoseRequest can be found in the "Understand Body and Hand Pose Using Vision" session. Also, keep this in mind because we're gonna use this for the Action Classification as well. Now, after so many slides, you want to see some code. So let me hand it over to my colleague Brent, who'll walk you through that. Brent? Thanks, Frank. Hi. I'm Brent from the CoreML team. I'm going to be walking through some of the code of the app that Frank was talking about. And as Frank mentioned, not only did we build this app for our session today, but we're also making it available for download. So if you'd like, you can pause here, download the app and follow along with me. You can find it linked to this session in the resources section.
All right, let's dive in.
The first thing I'd like to show you is how the app progresses through various states of the game. The app uses a GameManager to manage its state and communicate that state with the view controllers.
We'll see these states as we progress through the app.
And the GameManager will notify its listening view controllers of state changes. Also note that the GameManager is a singleton, which we'll find used throughout the app.
Next, let's take a look at the main storyboard.
When a user launches the app, it begins with the start and setup instructions screens.
Next, the source picker is brought up so that either the live camera or an uploaded video is used as input.
The SourcePickerViewController handles the selection of these input options.
After that, the app segues to the RootViewController.
The RootViewController is responsible for a couple of things in our app, so let's take a closer look at it.
The first thing the RootViewController is responsible for is hosting the CameraViewController.
When the RootViewController loads, it creates an instance of the CameraViewController to manage the buffers of frames coming from either the camera or the video.
The CameraViewController has an OutputDelegate which is used to pass those buffers to the appropriate delegate ViewControllers.
Once the RootViewController sets up the CameraViewController, then it calls startObservingStateChanges, which will register it to be notified by the GameManager of game-state changes.
This corresponds to the second responsibility of the RootViewController, which is to present and dismiss overlaying view controllers based on the game state.
The RootViewController has an extension where it conforms to the GameStateChange protocol. As the GameManager notifies its observers of game-state changes, the RootViewController will listen to these state changes to determine which other ViewController to present. This could be the SetupViewController, the GameViewController or the SummaryViewController. The SetupViewController and GameViewController classes have extensions to conform to both the GameStateChangeObserver and CameraViewControllerOutputDelegate protocols. Which means when one of those ViewControllers is presented by the RootViewController, it is also added as a GameStateChangeObserver, and it becomes the CameraViewControllerOutputDelegate. We just talked about how the app progresses through game states and how it passes buffers to the ViewControllers. Next, let's take a closer look at some of the key functionality that Frank was talking about. We're going to jump into the SetupViewController because this is the first ViewController that the RootViewController presents. When the SetupViewController appears, it creates a VNCoreMLRequest using the Object Detection model that we created to detect our boards using Create ML. We used the new Object Detection transfer learning algorithm in the Object Detection template. Then as the SetupViewController starts receiving buffers from the CameraViewController in its CameraViewControllerOutputDelegate extension, it starts performing these Vision requests on each buffer in the detectBoard function.
Here, the app is taking the results from the requests, which are the detected objects-- in our case, our game boards-- and filtering out low-confidence results. If it finds a result with a high enough confidence, it draws a bounding box on the screen around the detected object, and then progresses from detecting the board to detecting the board placement. The app then instructs the user to align the bounding box with the boardLocationGuide, which is already present on the screen. Once the object bounding box is placed within the boardLocationGuide, the app progresses to determining scene stability. One thing to note is that the app will only guide the user to move the board into the boardLocationGuide when the user is using live camera mode. It will not guide the user to move the board during video playback. During video playback, the app assumes that the board is placed on the right side of the video.
We just saw how the app detects our game boards and guides our users to place the board in the expected location in the camera frames.
Next, let's take a look at how the app determines that the scene is stable.
Let's go back and look at the CameraViewControllerOutputDelegate extension of the SetupViewController. It uses a VNSequenceRequestHandler because there will be a Vision request performed across a series of frames to determine the scene stability. This app iterates over 15 frames in order to make sure the scene is actually stable. As the SetupViewController receives buffers from the CameraViewController, it performs VNTranslationalImageRegistration requests for each buffer on the previous buffer to determine how aligned those two buffers are. When the ViewController receives the results from the request, it appends the points from the transform to the sceneStabilityHistoryPoints array, and then updates the Setup State again. When the Setup State is detecting player state, as it is now, the ViewController uses a read-only computed property called sceneStability to calculate whether the scene is stable or not. This property calculates the moving average of the points stored in the sceneStabilityHistoryPoints array. If the distance of the moving average is less than ten pixels, then this app considers the scene to be stable. Once scene stability is found, the app can progress to detecting the contours of our game board. So now, let's take a look at how the app does that.
We'll take a look back at the CameraViewControllerOutputDelegate extension of the SetupViewController. This time, when the ViewController receives a buffer, the setupStage is detectingBoardContours, so the ViewController calls detectBoardContours. This function uses the new VNDetectContoursRequest. Notice that the boardBoundingBox that was found earlier when we were detecting our boards is used to set a regionOfInterest for this request. This will cause the request to only be performed in that region. The app then performs an analysis of those contours to find the edge of the board and the hole of the board. Once the app finishes detecting contours, the game state moves to DetectedBoardState. Since the SetupViewController is also a GameStateChangeObserver, the following code will be run on GameStateChanges. In the case, the game state is DetectedBoardState, so the app lets the user know that a board has been detected, and the game state is changed to DetectingPlayerState. At this point, the app has found our game board, made sure it's placed correctly, determined that our scene is stable and found the contours on the game board. That completes the responsibilities of the SetupViewController, so we can move on to the next ViewController. We'll take a quick look back at the RootViewController, where we can see that since the game state is now DetectingPlayerState, the next ViewController that will be presented is the GameViewController. This means that the GameViewController will be added as a GameStateChangeObserver, and it will become the CameraViewControllerOutputDelegate. Since the GameViewController is now the CameraViewControllerOutputDelegate, it will be receiving buffers from the CameraViewController and executing the following code on each buffer. The GameViewController will perform its detectPlayerRequest, which is an instance of the VNDetectHumanBodyPoseRequest. When the ViewController receives the results from this request, it passes them to the humanBoundingBox function. This function filters out low-confidence observations and returns the bounding box of the person who enters the frame. Once this happens, the app moves its game state to the next phase. Let's remember this humanBoundingBox function because we'll hear about it again a little later.
Next, Frank is going to tell you about detecting trajectories of the bean bags while the game is being played. Frank? All right. Now we have our game play. So let's look at the trajectory detection. The VNDetectTrajectoriesRequest finds objects that moves through a scene, and it also filters out the noise from movements that we might not be interested in. But to actually use it, we need to have a better understanding as to how it works. So let's look at this. Here, we actually will see now, a throw, but I want to peel back a little bit the covers so that you can see under the hood what we are actually using for the analysis. So, this was our throw. But this is not what the algorithm looks at. It looks at what we call a frame differential. And we can now just highlight much easier the objects that are moving because they change from frame to frame. There's a bit of noise that we filter out. So, what we're going to do is we actually have a whole sequence, and we see in this action the bean bag flying. So we use our new VNDetectTrajectoryRequest, but it's a bit of a special request that's new in Vision this year. It's what we call a "stateful request." That means we need to keep this request around, which is not a mandatory part for other requests in Vision. But here, we need to do this because it builds state over time. So we feed it the first frame, and nothing happens. Continue feeding frames to it. Four frames in. Now we get to the fifth frame, and we actually get a trajectory detected, because we now have enough evidence. We cannot see from a single frame if something is moving. We need a bit of evidence over time, and that's why it's a stateful request. Now, the throw, of course, gets reported back in the VNTrajectoryObservation. But it started much earlier. I can now actually use the time range of the observation to know when my throw started. So now we continue feeding frames to our request, and our trajectory gets refined over time. Let's look a little bit more at our trajectory observation that we are getting back. So here, actually composited together, all the frame differentials of the whole throw. What we are getting back are the points which are the centroids of these objects. Those are the detected points. On top of it, we also get the projected points. These are the five points that perfectly describe the parabola on which the object has traveled. In addition, we get the EquationCoefficients that describe the parabola, which is y = ax squared plus bx plus c. Now, if I use these parts, I can actually go and create a nice, smooth parabola for a nice visualization. You'll notice there's a second parabola here on the bottom. This was created by a shadow of the bag flying. I actually get multiple trajectories at the same time, and I can differentiate between them by using the UUID. Now, we know in this one, it's kind of at our foot level, so it's unlikely the bag that we're gonna use, so we can actually ignore it and only focus on the top one. So, how do we use the VNDetectTrajectoriesRequest? When we create it, we give it a frameAnalysisSpacing. Again, let's look at our graphic, what happens. When we set the spacing initially to just a time of zero, we're actually going to analyze all the frames. But you can actually set that spacing to something else. And by doing that, we're only gonna analyze a few frames. That helps by reducing now the computational cost, which is important particularly on older devices. Next, I can also specify the trajectoryLength that I'm looking for. That allows me, for instance, to filter out small, spurious movements that I'm not interested in. And then we have our completionHandler, which is, as usual, the part in Vision where we actually deal with our results. In addition to that, we actually have two properties. By looking at the objects that we see in the scene, we see they have different sizes. We see on the left, we have the arm throwing. Then we have our bean bag flying. And there's some noise on the very right-hand side. And I indicated that by looking at the enclosing circle of our objects. By setting the minimumObjectSize, I can filter out the noise of the very small parts because I know the size of my bean bag that I'm actually expecting to see. And on the other side, I can also use the maximumObjectSize to filter out objects that are much larger that I actually don't care about. So I would never get a trajectory from them and really can focus purely, in this case, on the bean bag. Now, a few things to keep in mind when we use the trajectory detection. It requires a stable scene. Hence, the prerequisite that we actually have the phone stabilized on a tripod or otherwise fixated. Objects have to travel on some kind of a parabola. Now, a straight line is a parabola. That allows us to filter out spurious movements that we might not be interested in. You also need to feed in SampleBuffers with time stamps, because we're using the time stamps in our analysis. If your ball, for instance, bounces or leaves the frame, we get a new trajectory anytime this happens. And you have to combine those, and you can do this easily by looking, for instance, at the last point of a previous trajectory if it matches up with the first point of a new trajectory.
It helps to use the region of interest. If you know where you expect the movement to happen, you can filter out a lot of the background noise that otherwise happens around it. Last, but not least, use your business logic. Like, for instance, in our example, we knew that the bean bags would only travel from the player throwing at the board. We have not encountered yet a board that would actually throw the bag back at the player.
Now, again I would like to hand it over to Brent to look at the code. Brent? Thanks, Frank. Let's take a look at how we're detecting trajectories. The app will perform VNDetectTrajectoryRequests on each buffer from the CameraViewController. We can see this happening in the GameViewController. For each buffer, the GameViewController performs its detectTrajectoryRequest, which is an instance of the VNDetectTrajectoriesRequest. It's important to note that the detectTrajectoryRequest is happening on its own queue, separate from the CameraOutputQueue. Once the app receives the results from the detectTrajectoryRequest, it processes these results in the processTrajectoryObservations function. It's here in this function where the app tracks information about each trajectory, like the duration and points detected. It also updates the trajectory region of interest and checks if the trajectory is still in flight. If the detected trajectory points are outside the region of interest for more than 20 frames, the app considers the throw to be complete.
It's important to note that the points property has a didSet observer which calls the updatePathLayer function.
This function updates the trajectory path in the trajectory view and checks if the trajectory points are in the region of interest.
This function also calculates the release speed of the trajectory, but we'll be seeing more on that later. Right now, Frank's going to tell you about detecting the type of throw a player is making. Thank you, Brent. The next part we need to do is look at how we identify the type of throw. We created a custom Action Classification model using Create ML in which we used our own training data. We collected videos of the throw types we wanted to classify, but also videos of just walking or picking up the bags, so that we can filter those out. Brent will later talk about some details of how we trained the model. The Action Classification is using Bodypose through Vision. Just like the trajectory detection, it builds evidence over time. Let's look at how this works. Here's a sequence of the throwing action where we first have the player's body movement. Then we detect the moment of the throw from our trajectory detection. We accumulated the body poses from the VNDetectHumanBodyPoseRequest and take 45 frames around the point where the throw is detected. The window encapsulates the full throw movement. We merge the 45 body poses into one MLMultiArray and feed it into the CoreML model. From that, we get a label, which is the type of throw, and a confidence value. Let me hand it back to Brent to show you how this all looks in the code. Thanks, Frank. We're gonna look at how we determine the last throw a player made. We'll keep looking at the CameraViewControllerOutputDelegate extension of the GameViewController.
For each buffer received, not only is the GameViewController tracking the trajectory, it's also detecting key points of the player with a VNDetectHumanBodyPoseRequest and storing those points as observations. I mentioned earlier that we'd hear about the humanBoundingBox function again because it's here that the app stores the body pose observations. As Frank said, these body pose observations are the input to the Create ML Action Classification model that was trained to predict the player throw type.
Once a throw has finished, the updatePlayerStats function is called. It's in this function that getLastThrowType is called on the playerStats object. The playerStats object is an instance of the playerStats struct, which keeps track of stats about the player during the game.
A closer look at the getLastThrowType function shows us that it prepares the input of the Action Classification model using the body poseObservations that were being stored.
The prepareInputWithObservations function is a helper function that gets the body poseObservations into the input format required by the Action Classification model. This helper function also sets the number of frames required to capture a full throw action so those frames can be passed to the model for classification. With the input ready, the app makes the Action Classification prediction, and the throwType with the highest probability is returned. Now Frank is going to tell you more about the metrics we calculate on each trajectory. Thank you, Brent. The next part we need to do is measure our playing field.
We know the physical size of the board. It's a regulation-size board. Once we measure it out, by using the contours, we know how many pixels in our image correspond to the four-foot-by-two-foot size of the board. Knowing that, we have now a correspondence from our image to the real world. Now, the trajectory when we throw actually happens in the same plane where the board is. So now we can simply calculate the speed, because we have the trajectory, how long it took and we know our size in the real world.
The other part that we want to measure is the release angle. So we're kind of looking for where is the body pose at the beginning of the throw when I was actually throwing the bag. And we're comparing now what is the angle of the elbow to the wrist-- so my lower arm-- in comparison to the horizon? Again, let me hand it back over to Brent to show you this in the code. Brent? Thanks, Frank. In addition to getting the last throw type a player makes, the updatePlayerStats function also gets the releaseSpeed of the trajectory and the releaseAngle of the throw. I mentioned earlier that the releaseSpeed of the trajectory is calculated in the updatePathLayer function. When the app has a first observation of a trajectory, it calculates the length of that trajectory in pixels, which can be converted to actual distance using the game board length as a reference. The app then divides that length by the duration of the trajectory observation to get the releaseSpeed of the trajectory.
The angle of release for a throw is calculated in the getReleaseAngle function. This function uses the wrist and elbow points from the body poseObservation found in the buffer where the bean bag was released to determine this angle. We've looked at a number of components of this app and we talked about two specific machine-learning models that we created with Create ML. The first was the Object Detection model used to detect the board, and the second was the Action Classification model used to detect the throw type a player used.
In each case, while we were creating these models, we noted down some important points that we wanted to share with you. We'll start with the Object Detection model. We wanted to train the model with data from conditions it's expected to operate in. Our data was captured with iPhone because that's where their app will be run, and the model will be making predictions on frames from the iPhone camera. We included images of our board in our data because we knew it was the type of board we wanted our app to work with. We also included images of boards outside because we expect we would play the game outside. We noticed, however, that a few things initially threw the model off.
All of our original data was collected without people or bean bags in the images. The first model sometimes had difficulty detecting boards when people and bean bags were in the frame.
Adding images that include people and bean bags improved the model. Also, after our initial round of data collection, we added additional images from a range of distances and angles which helped improve the model when the phone wasn't directly perpendicular to the board. Next, we'll look at the Action Classification model. Again, we wanted to train the model with data from the conditions it's expected to operate in.
Our Action Classification data was also captured with iPhone. In addition, we included data captured at varying distances and angles to account for the iPhone being placed differently when we played the game compared to when we captured the data for the model.
One point we wanted to share about our first iteration of the model was that it was initially created with just three classes: underhand, overhand and under-the-leg shots.
However, that meant that all actions, including things like picking up a bean bag, were recognized as one of those three actions. To account for these additional situations, we added another class, a "Negative Class" or "Other Class," with people performing a variety of actions that weren't any of these three shots.
Doing this helped the model perform much better.
One other point we wanted to share was about setting the correct prediction window. The correct prediction window needs to be set so that it includes the entire target action. Some types of actions, like throwing a bean bag, may take more or less frames to capture than other types of actions. For the model to perform well, the prediction window should be able to capture the full action. Additionally, when using the model in the app, we need to determine when and how often to perform the prediction.
For this app, we didn't want to continuously perform predictions. Instead, we wanted to pick times when we can send the portion of the video when the throw happened to the model to classify the action in that duration. We do this once a throw has completed. We have a predetermined event, the end of a throw, at which time we send the frames from the beginning of the throw to the model in order to classify the action performed. Now, Frank is going to talk about best practices for live processing. All right. Thank you, Brent. Now, after seeing all that, there's a few things that we want to keep in mind for this application. It's all about real-time feedback. So we need to follow some best practices for live processing. When we deal with live streams, there are a few challenges. Our camera only has a finite set of buffers. When you work on a buffer to analyze it, it's not available to the camera anymore. So we can easily starve our camera from buffers. So it is important that we actually give those buffers back to the camera as soon as possible so that the camera has them available for its work. Now, you might think you know how long the algorithm takes, and it's less than a frame duration, so we should all be good. But that's not always the case, because the load on the system can vary. For instance, you might get in the background a notification, or some other network traffic has to happen, and your algorithm might actually take a bit longer. If you have difficulties keeping up with the frame rate, it really helps to use the CaptureOutput method for didDrop to actually get a notification why the camera was not able to deliver a buffer to you. Now, when we deal with live streams, we also want to split up our work. We know that we have here multiple things that we analyze on the frame. So we're gonna use different queues. So we feed our frame by using multitasking into different queues, and we can run them in parallel while the camera can do its work. You don't want to wait until you've rendered results of your analysis. That is important, because that you have to normally do on the main queue. So release the buffer before and asynchronously render on the main queue. Now the camera has all its buffer back available, and it will not stop the camera feed. Next, when we deal with live playback, the challenges are actually similar. When we deal with live playback, we need to make sure that the video continues playing while we do our analysis, so it doesn't stutter. So again, we have to make sure that we can process all the frames correctly. If you do a post-analysis, you would actually go frame by frame. But here, in the live playback part, we do not want to go frame by frame. So we're using the AVPlayerItemVideoOutput together with the CADisplayLink. It's gonna tell us when we have a new pixel buffer available for a given time. We might actually use that time a little bit in the future so it synchronizes with the actual video frame arriving on the screen. And then we simply copy our output pixel buffer and do the analysis based on that. All right, let's wrap things up. I hope you've seen that analyzing actions and sports can be really exciting. Now it works not just for the bean bag game. We can use it for something like tennis. And although the ball is really small and hard to see, our algorithm can actually detect it.
Or think of playing soccer. And we, again, can see the trajectory of how the ball was flying. But perhaps you want to coach the next generation of cricket players. And we can nicely see the trajectory of our cricket ball.
I would like you to think about what else can I build with these technologies? And what insights can I bring to my users? I can't wait to see all the great applications that you come up with and the innovations that you can build on top of our technologies. Thank you all for attending our session, and have a great rest of the WWDC.

Explore Get Started

Stay Updated

Explore Platforms

Featured

Explore Technologies

Featured

Explore Community

Featured

Explore Documentation

Release Notes

Explore Downloads

Featured

Explore Support

Featured

Quick Links

Resources

Related Videos

WWDC20