Core Video Concepts

Core Video is a new model for handling digital video in OS X. It provides two major features to simplify video processing:

This chapter describes the concepts behind these features.

The Core Video Pipeline

Core Video assumes a pipeline of discrete steps when handling video, from the incoming movie data to the actual video frames displayed onscreen. This pipeline makes it much easier to add custom processing.

Figure 1-1  The Core Video pipeline
The Core Video pipeline

The movie’s frame data comes from your video source (QuickTime, for example) and is assigned to a visual context. The visual context simply specifies the drawing destination you want to render your video into. For example, this context can be a Core Graphics context or an OpenGL context. In most cases, a visual context is associated with a view in a window, but it is possible to have offscreen contexts as well.

After you specify a drawing context, you are free to manipulate the frame as you wish. For example, you can process your frame using Core Image filters or specify warping effects in OpenGL. After doing so, you hand off the frame to OpenGL, which then executes your rendering instructions (if any) and sends the completed frame to the display.

Within the Core Video pipeline, the most important facets for developers are the display link, which handles display synchronization, and the common buffering model, which simplifies memory management when moving frames between various buffer types. Most applications manipulating video need to use only the display link. You need to worry about using Core Video buffers only if you are generating (or compressing) video frames.

The Display Link

To simplify synchronization of video with a display’s refresh rate, Core Video provides a special timer called a display link. The display link runs as a separate high priority thread, which is not affected by interactions within your application process.

In the past, synchronizing your video frames with the display’s refresh rate was often a problem, especially if you also had audio. You could only make simple guesses for when to output a frame (by using a timer, for example), which didn’t take into account possible latency from user interactions, CPU loading, window compositing and so on. The Core Video display link can make intelligent estimates for when a frame needs to be output, based on display type and latencies.

Figure 1-2 shows how the display link interacts with your application when processing video frames.

Figure 1-2  Processing video frames with the display link
Processing video frames with the display link

If for some reason the processing takes longer than expected (that is, the display link’s estimate is off), the video graphics card can still drop frames or otherwise compensate for the timing error as necessary.

Buffer Management

If your application actually generates frames for display, or compresses incoming raw video, you will need to store the image data while doing so. Core Video provides different buffer types to simplify this process.

Previously, there was a lot of overhead if you wanted to, for example, manipulate QuickTime frames using OpenGL. Converting between various buffer types and handling the internal memory housekeeping was a chore. Now, with Core Video, buffers are Core Foundation-style objects, which are easy to create and destroy, and easy to convert from one buffer type to another.

Core Video defines an abstract buffer of type CVBuffer. All the other buffer types are derived from the CVBuffer type (and are typed as such). A CVBuffer can hold video, audio, or possibly some other type of data. You can use the CVBuffer APIs on any Core Video buffer.

When using buffers, it is often useful to manage them in buffer pools. A buffer pool allocates a number of buffers that can then be reused as needed. The advantage here is that the system doesn’t have to devote extra time allocating and deallocating memory; when you release a buffer, it goes back into the pool. You can have pixel buffer pools in main memory and OpenGL buffer pools in video memory.

You can think of a buffer pool as a small fleet of cars bought for corporate use. An employee simply takes a car from the fleet when needed and returns it when they’re done with it. Doing so requires much less overhead than buying and selling a car each time. To maximize resources, the number of cars in the fleet can be adjusted based on demand.

In a similar fashion, you should allocate OpenGL textures using a texture cache, which holds a number of textures that can be reused.

Figure 1-3 shows a possible implementation of the frame processing that occurs under the hood when processing QuickTime movies, showing the use of a number of buffers and buffer pools to store video data as it progresses from compressed file data to the actual pixel images that appear onscreen.

Figure 1-3  Decompressing and processing a QuickTime frame
Decompressing and processing a QuickTime frameDecompressing and processing a QuickTime frame

The steps in the frame processing are as follows:

What’s in a Frame?

A video frame often has information associated with it that is useful to the system that displays it. In Core Video, this information is associated with a video frame as an attachment. Attachments are Core Foundation objects representing various types of data, such as the following common video properties:

You specify attachments as key-value pairs. You can either use predefined keys, as described in the Core Video Reference, or define your own if you have custom frame information. If you indicate that an attachment can be propagated, you can easily transfer these attachments to successive buffers, for example, when creating an OpenGL texture from a pixel buffer.