-
为 visionOS 打造沉浸式媒体体验 - 第 2 天
观看这场为期多天的直播活动,了解如何为 visionOS 打造引人入胜的交互式体验并拍摄沉浸式视频。
在第 1 天,我们将展示 visionOS 26 如何帮助你进行沉浸式和交互式的叙事,并让你的故事具有情感冲击力。你将学习如何构思适合 Apple 沉浸视频等格式的绝妙创意,并探索过往作品中的真实示例。你将了解如何针对空间交互进行设计和讲述故事,让观众成为体验的一部分;以及如何使用同播共享和空间自影像帮助用户围绕你的创意紧密互动。
第 2 天,我们将借助 Apple 空间音频深入探索 Apple 沉浸视频。你将了解如何打造全新的媒体体验,开展 Apple 沉浸视频的入门学习,探索新的制作工作流程,并深入研究之前的 Apple 沉浸作品。资源
相关视频
Meet With Apple
-
搜索此视频…
Good morning everyone. How are we feeling? Awesome. Welcome back to those of you that came for day one. My name is Elliot Graves. I help creators bring to life their immersive stories here at Apple, and I want to welcome you to the Apple Developer Center here in Cupertino.
For folks who might just be tuning in for today, here are the Developer center. Our goal is to connect with you about your ideas and to help, you can create some incredible experiences for Apple platforms.
Today is all about Apple immersive video. Yes.
Yesterday gave us a taste of what was going to be possible with immersive video, but today we're going to explore every detail from the teams who have designed the formats, to the creators who are pushing their boundaries and the tools that make it all possible.
So whether you're here to master the workflows or just understand the craft or simply just get inspired, there's going to be something for you.
Now, as always, we ask that you please refrain if you can from doing your own recordings and live streamings. You're welcome to take photographs of course, and some screenshots, but leave the video to us and you are welcome to take notes if you want to hop online and you're here with us in the room, you can get online with the Apple Wi-Fi network if you haven't already.
For those of us in the audience who joined for day one, we hope you had a great time. If you missed it and you want to catch up, you can do so on the Apple Developer app, the website, or in fact, YouTube day one really set the scene. But for today, we've got some more great sessions lined up for you, and we're going to take you on a deep dive into Apple immersive video. Like I said, you're going to hear from Apple engineers. You're going to hear from production teams, as well as some very special guests from the industry. And for those of you in person, we're also going to have a seminar on Apple Immersive Video post-production. A little bit later on, we'll finish off with a community mixer for our in-person guests. Also, we do have a few more spots remaining for 1 to 1 consultations on Thursday, so if you're interested in meeting one on one with a production team or some of the creative folks behind the Apple Immersive titles as well as engineers and designers, all you need to do is head outside and speak with one of our concierges. They'll help you out.
All righty, we're ready to check out what's in store for today.
First up, we're going to have Ryan Sheridan and Deep Sen join us on stage, talking about the behind the scenes and the design of the Apple Immersive Video and Apple Spatial audio formats.
Then we're going to move to workflows for Apple Immersive Video. As we go through what you should consider from capture all the way through to distribution. We'll have members from the production team behind our Apple Immersive titles joining us, as well as guest speakers here to share more about their powerful tools and how they can empower your productions.
We're going to take a quick a quick break for everyone's favorite part of the day for lunch. And then France and the team are going to share their insights from behind the scenes of our Apple Immersive titles. They'll be covering what they've learned as creatives or producers and post-production leads. We'll also be hearing from a special guest on what they've learned directing the first ever Apple Immersive music video.
Just like yesterday. If you have any questions for any of our presenters throughout the day, please, please, please drop it to us online using this QR code at the end of the day. We're going to answer as many questions as we can, especially the ones that get upvoted in a live Q&A panel.
And after our Q&A, we will sadly say goodbye to our friends on the live stream. But if you are in person after a quick break. Stick around for an exclusive seminar. We'll be working with the Apple Immersive post-production team with Blackmagic and of course, Main Course Films, and they'll be taking us through some detailed post-production sessions with live demos across editorial, VFX, spatial audio, and of course, coloring and finishing. So I think it's going to be a really informative one.
Right. I think it's going to be a pretty awesome day. We have an awesome lineup. We're going to finish off with a mixer as well in the lobby, but are we ready to dive in deep on AV and ASAP formats? Nice.
I'm going to hand it over to Ryan to kick us off. Ryan.
Good morning. I'm Ryan, and I lead the production ecosystem team for Apple Immersive Video, and I am so excited that you guys are all here.
So our team partners with the AV Platform architecture team. These folks develop the feature rich technology that enables Vision Pro to deliver magical experiences.
Referencing these two teams helps us distinguish content production focused elements from developer focused playback elements of AV.
And today, we're going to talk about what makes Apple immersive video different from everything else.
We'll start with Fidelity of Presence. The AV ecosystem strives to capture, render, and deliver the world in perfect fidelity and as a reference point. 2020 vision is considered to be that number with the human eye can see. Remember, we'll come back to it a little bit later.
Next peripheral FOV or Field of view AV delivers a field of view between 180 and 230 degrees. There's a few reasons for this.
The first viewer comfort. Just like when you sit back, relax and watch a 2D or 3D movie in Vision Pro, turning around to see something behind you doesn't feel very natural. So to complete the experience, AV uses amazing spatial audio technology. We'll go a little bit deeper into that with spatial audio in the next section.
The second suspension of disbelief beyond 180 degrees, AV blends into the viewer's periphery. This helps maintain that profound sense of being there without reminding the viewer they're not actually there.
And third is efficiency. By keeping pixels in the viewer's normal field of view. AV maintains a higher sense of presence. Said slightly differently, each pixel is maximized for efficient streaming without compromising fidelity.
The next pillar dynamic bespoke projection. This is a fancy way of saying AV has no default or standard projection. It's an interesting concept. This eliminates the need for converting clips to lat long before you start editing, saving production time, render time, and above all, storage space. You don't have extra files, so instead AV uses very short experiences that carry the unique lens metadata from the live action camera or CG camera that was used to create each clip.
And last, and probably the most important pillar is world scale. This is that innate human ability to sense distance to an object, reinforcing that feeling of being there.
Since Vision Pro projects each pixel as it was captured, AV is free from warping and stitching artifacts, meaning straight lines are straight and objects have natural roundness and accurate stereoscopic cues.
To do this, Vision Pro needs highly accurate lens calibration data per shot. That data comes from the Elpd file or the immersive lens processing data file.
The Elpd file is a JSON, and it's only about 50kB and sometimes less. And despite that small size Aapd carries everything to accurately reproject every pixel in an AV frame.
So let's see how that works.
Each lens of an AV enabled camera is individually profiled at the Camera manufacturers facility. You can think of this as the optical fingerprint of that specific serial number camera.
This profile is then loaded into the camera and is added to every single clip it records for virtual cameras in DCC tools. There's even a specific IPD that parametrically defines an AV projection specific to CG rendering.
So using this IPD with Apple immersive enabled post-production tools, we can eliminate the need for manual lens solving. Terabytes of intermediate file generation and easy compositing. The feed of your CG character hit the ground of your live background, and this makes visual effects workflows that much easier.
So these are the fundamental pillars that make AV different. But there's some other differences worth calling out.
And first we'll talk about capture size difference between AV and 2D.
Let's start with frame rate. At minimum AV is captured and played back at 90 frames a second.
At minimum, AV is 7200 by 7200 per eye, versus our normal 4K size of 4096 by, say, 2160.
And last, AV is always stereoscopic.
When you add this all up, you get a 44 times increase in the number of pixels required to achieve the benchmarks that we just talked about, about being there. That sounds like a really big number a few years ago, but thanks to ProRes Efficiency Media Extensions for codec developers and Apple Silicon Performance, your AV editing experience should feel like 2D.
Now, obviously folks don't stream in their production and archive format, so in a moment we'll talk about how to encode and package AV for delivery.
Having covered what makes Apple immersive video different, let's go a little bit deeper into AV delivers fidelity, a presence, and a hint it is all about acuity.
Traditionally, we talk about videos in the k value. The problem is, the k value tells us only how many pixels are inside an image. Container tells us nothing about what the viewer is going to see at the end of the day in Vision Pro, so it's not a very useful benchmark for AV quality.
So how do we define the resolution of an AV file or format or experience? For that we use the term acuity.
And this term by itself doesn't really doesn't really help. It's not that useful until we have a scale and a reference point. And for that we use the eye chart. So whether it's the classic eye chart or the contemporary eye chart that most people in this room have seen at the doctor's office or the optometrist, the principles have not changed since 1862.
And the important feature of any eye chart. The important feature of any chart is that the letters are subdivided. The optotypes are subdivided into a grid pattern and used correctly. The eye chart can tell us that someone with 20 over 20 vision acuity can perceive about 60 pixels per degree. It's an interesting number.
From this, you can start to quantify that feeling of being there by using 2020 as the benchmark and PPD as the scale.
Fidelity presence relies on perception of distance. How far into the world can you can I can we see that perception of distance creates that innate sense of connection to the world around us? If you've ever seen a video of someone receiving corrective lenses or Lasik for the first time, and most of us have, you know what that connection is, you know, the power of that connection and that perception of distance is a core function of acuity. And acuity is measured in PPD or pixels per degree.
So how does all of this relate to content? We'll do it. We'll do simple. If you take 180 degree lens and you put it on a standard AK camera, you get a 24 PPD image or about 2065 vision.
Now take that same camera and put your friend 25ft in front of it. At 24 PPD, you might be able to tell the difference between your friend's smile and your friend's grin, but at 60ft, that distance becomes a little different. You can't tell the difference between a friend and someone dressed like your friend gets a little blurry, and that definitely falls short of being there. This is why AV strives for 60 pixels per degree, and why AV enabled cameras have a minimum of 40 PPD, ensuring that content is captured as close to 2020 vision as possible, both for today and future.
If you've ever had the pleasure of working in traditional 3D or VR, you know it's complex. The workflows are complex. Cameras are complex, systems are complex. And so we believe that creating immersive video shouldn't be tedious, complicated, or take years to learn. This is why AV is simple and that simplicity becomes its superpower.
So the AV ecosystem was designed to be simple. I mean really simple. Like all in one cameras that operate like a 2D camera that generate a single file, workflows that feel like editing 2D and encode and delivery pipeline that feels normal.
When something feels simple on the surface, it's usually fairly complex under the hood. And that's why we call Apple Immersive Video a compound format.
AV abstracts the complexity, starting with managing all the various types of metadata in an AV production, like the lens calibration we just talked about a second ago, or per frame motion data. And we'll definitely come back to that one a little bit later. Or dynamic transitions, video transitions rendered by Vision Pro in real time. And there's a few of these. And we do this for encoder efficiency and also some really cool features of the format.
Going a little deeper, let's look at the basic structure of the two AV file types production and delivery. Developers will recognize this as the QuickTime file format creatives. You'll recognize this as the MOV file. Just remember they are one and the same.
So it's worth noting that developers like Blackmagic chose to store AV data in their own format like braw, and also for visual effects workflows. AV takes advantage of multi-part EXR. And so, regardless of the file type that we use to store everything, a few fundamental things are always the same.
All video independent, individual video tracks audio tracks along with all the per frame metadata. Even Usdz assets background assets are stored and transported in a single file.
This means everything is always in sync and always travels together.
No sidecar files, no groups of files to keep track of. And this makes AV feel like working in 2D.
So a quick example, a very simple example. You're an editor and you may be working on the timing of a shot and you're looking at a single lens view. But Before you lock your edit, you want to very quickly verify that there's no smudge on either lens. So you quickly click a button, do your checks, and go back under the hood.
The NLE is decoding two individual frames, two individual tracks. It's not a side by side file.
This is just a simple example of how AV keeps everything at your fingertips without unnecessary files cluttering your timeline. Traditionally, you'd have to link together a single file, a side by side file, and maybe something else, like a proxy to make this work effectively.
Now that we've talked about AV production format using ProRes uncompressed or Raw, let's talk about AV delivery and specifically the AV file. Developers once again keep in mind this is QuickTime. Under the hood, I'll come back to media prep and packaging in a moment, but it's important to keep in mind that the AI view file is intended for sharing and playback on Vision Pro. It's not intended for editing or archive. That being said, because it's quick time under the hood as a developer, you could add editing support to your apps for this.
So let's walk through how the AI view file is built. First, the individual ProRes tracks are transcoded into a lightweight video track that means multiview or left and right views. And as the name suggests, it is a variant of the well understood and highly efficient Hevc codec family.
Next, uncompressed spatial audio tracks and associated metadata are encoded into lightweight APAC audio tracks. We hear more about spatial audio, the spatial audio format, Asaph, the Apple Positional Audio Codec, APAC in the next session.
Both of these assets will come from your NLE if it's enabled with audio mixing or a door to do your final mix.
Next is presentation track. This track stores all of the metadata that signals per clip dynamic changes, calibration changes, fade transition effects the real time effects in Vision Pro. You can think of this as a real time EDL or real time FCP Zcml.
The developers, you'll know this as timed metadata in QuickTime and like audio presentation track data comes from your Apple Immersive enabled NLE or DCC.
And finally, the Amy file. This is where all of the camera calibrations, edge blends, backgrounds, and all the other AV experience metadata is stored. This too will come from immersive enabled NLE or DCC.
Now that we've talked about production and delivery formats, let's talk about best practices and the technology used to prep AV media for delivery. Here again, the name of the game is Acuity Preservation. So first are the best practices to preserve acuity. This simply means not doing things that will negatively impact acuity, like converting AV into other projection types like lat, long or downscaling. Camera source media file before the encoder final delivery. Let that happen in the encoder or shooting with excessively high ISOs in your cameras.
Second best practice is image prep. This includes processes like image noise reduction that help with encoder efficiency, or even AI assisted edge detail enhancement that can help preserve acuity when streaming at really low data rates.
Now we'll get into the technologies that are specifically built to help preserve acuity for AV delivery. And the first is foveation. This is a special imaging process and technique for preserving the most essential data while recording AV image sizes or reducing AV images. Image sizes for encoding.
Obviously, like we said before, it's impractical to deliver a nearly 11 K by 11 K image to Vision Pro, so we need to make this fit into a target delivery size of 4320 by 4320 per eye. Again, the name of the game is efficiency, but as you can see, a simple downscale decimates the image to 24 PD. We talked about earlier and that's less than ideal.
So to preserve acuity we want to preserve the most important parts of the image. And for that we'll employ Apple Immersive Video Foveation. In this step we'll take advantage of the positive effects of oversampling.
And as Tim Dashwood mentioned yesterday, the goal of AV Foveation is to achieve a is not to achieve a specific PD target, but instead, like the chart shows, it's to help you balance acuity and pixel preservation to meet your creative needs. And because AV Foveation is not one size fits all, you can use AV per clip calibration feature to apply user or developer tuned Foveation patterns to each clip. That's a superpower.
And that brings us to the most critical AV technology the tuned encoders. Specifically tuned encoders using an AV tuned Nvenc encoder combined with Foveation. You can take that 4320 by 4320 ProRes file and encode it to be roughly the same data rate of your typical 2D, 4K, 24 frame Hevc file without sacrificing that feeling of being there.
As I said earlier, I was going to come back to one of my favorite parts of AV. So let's talk about motion data for the content creator.
As Eliot mentioned yesterday, planning for motion in AV is a responsibility and a tool. AV motion metadata can be built into every single shot. So you can take advantage of it throughout the decision making process and the moment in your scenes that really, really matter.
In an Apple Immersive enabled NLE. You and your crew can visualize AV camera motion before you review it. In Vision Pro.
This is a quick example of what it looks like in Resolve's timeline.
There's also a positive and creative benefit to visualizing motion data in the timeline.
Previously, creators have had to rely on emotional and psychological beats to complete their narrative and story arcs. With AV and the visualization of motion data, AV also lets you use this as a story tool by visualizing physical camera motion in your timeline. Creators can can craft better and better emotional beats and arcs to your story before even visualizing it in Vision Pro and not to be left out. CG animation or CG content. Animated content for AV has the exact same principles. Motion applies in both mediums.
Needless to say, this is a lot of information. But to sum it up, delivering that feeling of being there requires a number of things really high standards, simplicity, best practices, and purposeful technology. But that is only 20% of the story. Audio is the other 80% of the immersive experience story. So to talk about Apple's new spatial audio technologies, I'll hand it over to Dyson.
Thanks, Ryan.
All right.
Right. Well. Thanks, Ryan. Good morning. I'm deep, and I'm the lead immersive audio architect at Apple. This year, our team publicly released new end to end immersive audio technology involving a new audio format, the Apple Spatial Audio Format, or ASF, and the new codec called APAC.
Today, I'm excited to give you an overview of this new format and the new codec.
I'll also dive into the workflow a little bit and tooling to create content in this new format.
Let's explore the format a little bit.
The format allows content with both HOA objects and metadata. HOA. Microarrays allow accurate capture of 3D audio, and objects allow creatives to mold the sound field with infinite spatial resolution. Furthermore, the format allows very rich combinations of these to provide the high spatial resolutions required to match human acuity. Most of the content of the Vision Pro, for example, is made up of fifth order Ambisonics along with many objects. That's unprecedented in the industry in terms of spatial resolution. Finally, the format uses a metadata driven renderer that generates acoustic cues on the fly during playback. It adapts to object, listener, object, and listener changes in position and orientation, presenting an amazing level of acoustic detail to the listener. None of these are available in an existing spatial audio format.
All of these are designed to meet the incredibly high bar for creating realistic, immersive audio on a device like the Vision Pro. Let's discuss why that is. As with any audiovisual content, there is a need to create an acoustic scene that matches the visuals. However, here any slight deviations and incongruencies are easily discernible. In essence, the requirement is to teleport the audience into that scene and feel naturally present in that new soundscape. This is not an easy task.
This is very different from the audio system in a theater like this, for example. For example, a movie played on this screen would be at a significant distance to the viewer. Incongruences and precise audio positions and distance from the corresponding video are difficult to discern, and often at this distance the visual cues override incorrect positions of audio sources anyway.
Also here, the viewer is never present in the scene. The viewer is always aware of their immediate surroundings. And finally, consider that in a theater like this, sounds are always going to be emanating from loudspeakers, which are, of course, outside of the listener. These sounds are therefore externalized by definition.
All existing spatial audio formats were primarily designed to be played in this way. That is, over loudspeakers and externalization is guaranteed.
But when listening over headphones, externalization is not guaranteed without significant accommodations in the sound design. Its accurate transport through a specially designed codec and adaptive rendering. The audio experience tends to be mostly internalized or inside of the head.
The new format ASAP, on the other hand, has been designed primarily for headphone playback. We've added a level of detail and accuracy that produces the natural and convincingly externalized experience that you've all heard on AV. Content on the Vision Pro by now.
But before I go too much further, let's quickly formalize what I mean by immersive audio.
I've used the terms natural and externalized, and these are the two principal dimensions that lead to an immersive audio experience.
There's no doubt that the sense of immersion is most pervasive when the spatial experience is completely natural and convincingly externalized.
But let's break that down some more. First, naturalness. What is it really? It's essentially matching the audio experience to our internal and often subconscious expectation of what the audio experience should be. There's almost an internal plausibility metric that dictates that sense of naturalness and externalization.
This matching is achieved when key acoustic cues in the rendered audio are accurate and non-conflicting. This is fairly intuitive and backed up by all recent research.
To summarize, in order to get compelling immersive experiences, acoustic cues need to be presented accurately. Let's take one very important cue for now early reflections in a common real world scenario, and consider the challenge of rendering it accurately.
I'll start with a panel of sorts that's present in some form in all spatial audio content creation tools.
This depicts a listener in a room as seen from above. I'll visually visualize this in 2D for now.
The walls of the room are here outlined in white. There's some furniture like a couch and a table.
The listener labeled L is standing in the center of the room with a directional arrow indicating their orientation, and an audio object label O in blue is about to make a sound. The directional arrows indicate the orientation of the object.
The direct sound from the object travels to the listener's left and right ears, and if the listener were to rotate their head, the angles of the direct sound change. Now, Apple is already provides the best in class adaptive rendering of this effect by tracking the head rotations as well as modeling our individual head shapes. We've been doing that for a number of years, but the effect of direct sounds is just one factor.
In reality, people don't just listen to the direct path. A listener hears the sound as a combination of both the direct path as well as the reflections from the surfaces in the local vicinity, and this is a lot more difficult to both represent accurately in the mix and to render it.
Consider the effect from the closest wall. It's useful to think of the reflection from the wall as a virtual object, reflecting the sound from the original source to the listener's ears. However, notice what happens if the listener rotates their head. The position of the virtual object changes.
And if that change is not accounted for, an incorrect acoustic cue would be presented to the listener, even though it might look subtle when considered in isolation. This will contribute to the overall inaccuracy that will at best reduce the immersion effect for the listener. That is, reduce naturalness and externalization, essentially reminding the listener that they're really not part of the scene.
And remember, it's not just one virtual object that you have to account for unless the virtual soundscape is in free space. Reflections have to be accounted for from many different reflections. Reflective points.
And what I showed exemplifies that these reflections change as a function of listener orientation. As such, they cannot be baked in as a constant into the content. Baking in these effects, however, is the norm in existing spatial audio formats.
The reflections will also change if the object moves, or if both listener and object move. And remember, this example is just in two dimensions when the effect is really in three dimensions. And remember, reflections emanate not just from the walls and the furniture. They also come from the ceiling and the floor. If the creative mixer stored the effect in, say, 7.1 channel bed, the effect of the floor and ceilings are seriously Previously underrepresented.
It's not just the reflections, however. There are many other such acoustic cues that the listener is subconsciously expecting to match. For each object, there are distance cues. There's the radiation pattern and orientation of each sound source, all of which change when the position of the object or the listener, or both change with time.
It would be almost impossible to precisely account for all of these manually, and for hundreds of objects when creating an immersive audio mix. And if you do manage to bake in specific effects during content creation, you're at best only accounting for one possible listener position or orientation. It's impossible for the acoustics to adapt.
This leads to unnaturalness and reduced externalization of the audio, which breaks immersion for the audience.
Since no existing format supported this level of accuracy and adaptation, while also not having the required high spatial resolution. We built Assaf.
To summarize, the new format allows metadata driven, real time rendering of critical acoustic cues that adapt with the listener's position and orientation. These acoustic cues are now baked in for object based audio.
Unlike other formats with, the effects are manually generated and baked in during content creation. The effects in Assaf are computationally generated during playback.
And all of this is powered by a new spatial renderer built into all Apple platforms. This means that the renderer used during content creation, say, on macOS, will maintain artistic integrity on the Vision Pro. These powers are a key part of why this format can help you tell such realistic stories for the audience. And when it's done correctly, the listener is immersed into the scene, disassociated from both the awareness of the device and their immediate surroundings.
Apple has already been using this in production for all of them. Apple immersive video content.
However, the format is only one part of the technology. Most of the content shown here actually delivers that fifth order Ambisonics simultaneously with 15 objects and metadata to the end device. That is an industry leading 51 number of lpcm signals.
To transport this high resolution spatial audio, Apple developed a brand new spatial codec.
It's called Apple Positional Audio Codec, or APAC, and the primary motivation for the development of the codec was to deliver high resolution ASAP content. APAC is able to do this with very high efficiency that is keeping the bitrate low. This also isn't just for the vision OS. APAC is available on all Apple platforms except watchOS.
That content dimension fifth order Ambisonics, and 15 objects. At 32 bits per sample, that's 81 megabit per second payload. With APAC, we can encode this at one megabit per second with excellent quality. That's an 80 to 1 compression ratio, or 20kbps per channel.
This bitrate can actually go down. The total bitrate can go down to as low as 64kbps, while still providing head tracked spatial audio experiences.
Consider the fact that audio bitrate for transparent stereo music is 256kbps. That's 128kbps per channel.
So how can you take advantage of the new format and the new codec? Well, let's go over the end to end workflow.
To start, let's go over the tools to actually create a set of content. Currently there are two. There's the just release Apple X plugins for Pro Tools. This is called the ASAP Production Suite. You can download the suite from Apple's Developer portal for free. Go to.
Download and search for Spatial Audio or scan this QR code if you like.
And then there is Blackmagic's Fairlight in DaVinci Resolve Studio.
Both of these allow the creation of up to seventh order Ambisonics, in addition to hundreds of ASAP objects and metadata.
Both of these tools provide a 3D panner that facilitates the positioning of audio objects in 3D space and at any distance, a video player that allows for the objects to be overlaid on video. The video player and the panner are synced. The ability to describe the type of acoustic environment that the objects are located in.
The ability to provide a radiation pattern, as well as a look direction for each object.
Give the object a width and a height.
Convert the object to HOA via room simulation if desired. Room simulation for HOA allows it to have accurate reverb that we talked about that is critical for externalization. HOA signals can be manipulated in various ways as well, tagging objects as music and effects, dialog or interactive elements, and finally saving the entirety to a broadcast WAV file or optionally also to an AIPAC encoded MP4 file.
Later today, the Black Magic team will talk about the professional audio production software that supports Asaph and is completely integrated with Apple Immersive Video. The Apple team will also briefly discuss the Apple Pro Tools plugins for creating immersive audio.
I'll go over. I'll go through the end to end content creation and distribution process. For Asaph, content is created by the creative mixer, bringing in all kinds of microphone recordings and stems into the tool.
This produces a set of PCM signals representing a combination of objects, Ambisonics and channels, as well as a set of time varying metadata such as position direction, room acoustics.
The Creative Mixer is able to listen to that mix by rendering the PCM signals in the metadata, and once the mixer is done. Of course, there's a representation of the format, which are the PCM signals, the metadata and also the renderer. But it's assumed that the renderer is present in the playback device. So it's really the audio format is really made up of the PCM and the metadata. This is saved on that broadcast WAV file and then encoded using APAC and subsequently converted using HLS tools to a format that's suitable for streaming.
On the playback device. That fragmented MP4 is decoded into the PCM metadata ingested by the adaptive renderer, which also gets the position and orientation of the listener.
This allows then the rendering of the immersive audio experience.
To summarize, these tools provide the creative artists with the ability to create both live and post-produced ASF content using objects, Ambisonics as well as channels and a combination of all of these.
Ambisonics allows a tremendous capacity for live capturing of soundscapes using microphone arrays.
Artists can also augment an object into Ambisonics. Discrete objects can be described using a comprehensive set of metadata.
Multiple alternative audio experiences can be created and delivered in the same APAC bitstream, allowing the audience to select and personalize their experience, and the rendering produces key acoustic cues accurately and adaptively and directly to headphones, without a need for intermediate virtualization to loudspeakers.
And all of this, especially the ability to efficiently transport spatial audio with industry leading resolutions, is facilitated by the new codec. APAC.
All right. So that brings me to the end of my presentation. Remember, audio is at least 50% of the experience. I hope you're encouraged to dive in and get immersed. Well, that is dive into creating compelling soundscapes with Asaph and APAC. We're not quite done with audio just yet. Later today, Tim Amick will talk in more depth about post-production, and both the Blackmagic team and Apple team members will discuss Asaph tooling. And with that, I'll hand it back to Elliot.
Thank you.
Thank you very much, Steve.
That was awesome. I hope we all feel suitably versed in AV and Asaph now.
They're pretty cool formats, super powerful under the hood, but I think what gets me the most excited about these are how they're taking some of the heavy lifting out of what was the creative process. And we can take all that time that we we used to wrangling files, cameras, audio, spatial audio. And we can put all that time now into the creative process. So hopefully that will make your stories even more compelling.
Now we're going to take a little wee break and see you back here in ten minutes. Hopefully give you a little moment to digest some of what you've heard. Thank you.
All righty folks, welcome back from our quick break.
So so far we've seen how the formats work. But now we're going to explore how we can use them to bring to life our work. And that's going to happen in your production workflows. So in this next session we're going to look at the workflows that make Apple immersive video possible. We're going to start with capture and with audio, and then we're going to hand over and hear from some very exciting industry folks who have come in to share more about their tools. But to kick us off, I'm going to invite up to the stage Austin. He's going to share some of his incredible insights about setting up your capture workflow for success. And I think you're going to love it. Austin.
All right.
Thank you. Elliot.
My name is Austin Novey. I'm an expedition cinematographer within the Apple team, and I've worked on a handful of Apple immersive titles, specifically wildlife.
So my background comes from. Thank you, thank you. My background comes from the natural history and expedition world, and I specifically work in remote location and complex filming scenarios like down here in the Bahamas, 40ft below the surface, or like here in the Canadian Rockies, where even the simplest task, like swapping media or defrosting a lens, can be almost impossible or feel impossible, or even under here in the beautiful, powerful waves of Tahiti.
Immersive content allows us to bring the viewer with and share mesmerizing views like this with the entire world.
So how does our team do this? And where do I begin with prep so that the creative teams can depend on me for getting the shot? Today, I'll walk you through how I would approach the prep and workflow as if the show has just been greenlit. I want to make sure that I'm ready for whatever the director shows throws at me. With that, I'm going to start with discussing camera prep kit out, then cover some key grip and equipment decisions, and then briefly talk about something which is often overlooked dailies.
So let's start with packing the camera.
While the camera is beautiful for it to leave the backpack and be useful, there are a couple things I'm going to absolutely need. And the first is power for expedition shoots. I'm thinking about a small footprint. And many times scheduling is last minute, so I need to be able to have a travel safe battery. So under 100W is a must. Stackable V-mount batteries are super nice option because they allow me to come. They allow. They allow the backpack to come in and out, the camera to come in and out of the backpack.
Like you heard from deep, I'm absolutely going to want to make sure I can capture great spatial audio to help the team in post bring the scene alive.
For that, I typically use a zoom F6 and a VR microphone because the straightforward four channel Ambisonics are easy and fast to deploy.
Power it from the camera via USB and you're off to the races.
Now, it's always really awkward when you have to explain to a director you forgot the media, so you have to make sure you bring it. It's not a good conversation. So you got to bring enough media that you can prioritize the moment. If a wolf or an elephant walks by. You don't want to stop rolling.
I'd likely be packing something like the 16 terabyte media module, but an important note is those long media modules will lead to long download times and dailies, so plan for that.
Now, if there's one thing I've learned over the past handful of years working with these cameras, it's take time to prepare as much as the post teams who are here love hearing. Fix it in post. That sentiment.
Taking time to plan your shots and fix your horizon are excellent, and shooting with intent will get your creative to another level. And when planning and when planning for media and power in the field, I always recommend to over pack what you think you need because like Mother Nature, when things are good, they're great. And I can confirm that batteries and media storage are the last thing on my mind during this circumstance when a tiger shark is micromanaging.
Okay, so now let's say we're out in the field, but you need to be able to see what you're capturing. That's where monitoring comes in. So how do I do that? Well, like here, if I'm out in the field and just have a few minutes to deploy and the helicopter gives us a few minutes to get out and then get back in using the LCD, the internal LCD wing is an excellent mobile option. Access is limited. Your footprint needs to stay small, and I can't go set up a small HD.
While the small, nimble setup is nice, you may have some aerial days coming up, or simply your production team no longer wants to look over your shoulder at the small LCD sharing the frame. Let's discuss a few additional monitoring options out of the camera so you can have a couple SDI choices.
The most common onset monitoring is going to be a single eye. This will be fantastic for heli work, gimbal techs, crane techs, underwater ops, and all other onset displays.
Small HD gives you the ability to punch in on an image for precision, on focus, and confirming that the sound mixer is out of frame, like Alex, who you'll hear from soon.
Now, the dual I, the dual II can be very important to when you're checking that the lens is complete consistency from flaring, and that a bug is not crawling over the right eye, but not the left eye. But you want to avoid a bug on any of the eyes. I'll tell you that.
So moving beyond the simple setup that I use most of the time. Let's chat about the fun equipment we can add to production.
Like Ryan said earlier, simplicity can be very significant. The tripod is the most used tool for the majority of our content here on Apple TV+.
This allows the viewer to really take in the immersive scene, gives the viewer the ability to creatively explore the entire field of view without being directed one way. This allows the audience to stay connected with the simplicity of the tool.
But who doesn't love some aerials for heli work? We've worked with pilots and techs all over the world to bring the immersive skies to you. We've worked with numerous companies and partners to build special adaptions for our Apple Immersive experience, like pictured here, The Immortal Phoenix Head. We've tested in some of the toughest conditions that Mother Nature could throw at us.
Of course, other aerials like heavy lift drones, provide a much more attainable and cost effective.
presentation, like here with Lightcraft for Arctic surfing.
We use the Alta X and a movie pro as we flew through moving conditions in the Arctic. But it's very important to keep in mind that with cold weather and something like the Alta X and a payload, our battery time is about five minutes from start to finish for a safe flight. So keeping that in mind is is really important.
So because of the wide wide field of view for Apple Immersive, it's also important to keep in mind that with the Alta X the final footage, if you want to get the blades and the boom out, you'll have to plan for that VFX cost. But there's other custom heavy lift FPV drones like pictured here from revered that have top mounted drones that allow you to avoid that VFX post.
Now here's another aerial shot using Cablecam, where we're able to get both vertically and horizontally without being able to scare the wildlife, And we can get as intimate as possible.
Now, as beautiful as the topside world is below the surface and immersive is unlike anything you can imagine. With that being said, we're looking forward to seeing underwater AV tools becoming available later next year.
From the depths of the sea to the Arctic Circle, we have tested plenty of tools for immersive, and there's a tool just about everywhere, from Scorpio cranes to arm cars to Steadicam and a variety of remote heads. This camera is easily compatible with all the fun tools you need.
Now let's move on to dailies.
Dailies are essential for our format. Like Elliot talked about yesterday with the composition, you can't confirm exactly your frame unless you are in a Vision Pro. It is insanely humbling to see the shot you thought you got and to see the shot you actually got.
So let's look at two setups of how we do this in the field and in the studio.
For someone like myself, if I'm shooting on a boat, my ideal scenario is a small slimline Mac pro, a few raids and be able to confirm the shots. We got to build out our story.
But processing power and live color may be an absolute need for production. The team on land would have a cart with multiple Mac studios for a faster backup and processing power. Oftentimes, a dual monitor setup for live color and multi renders at once.
Now just look at that smiling face from JD. He's absolutely blown away with the performance.
All right. With all that in summary, the camera was designed to be a simple, exciting tool that you can use to tell immersive stories from a tripod to a helicopter. And always remember to absolutely check your work with dailies.
And that'll take me to the end of the capture workflow. And now I'll hand it off to Alex Weiss to take us through audio.
Thank you. Austin.
All right. Hello, everybody, and welcome to the other half of film production. That usually gets about 10% of the attention and 0% of the glory. Audio I'm Alex, and I am the audio lead on the Apple Immersive video post-production team.
As you can probably guess, I'm a huge spatial audio nerd, and I hope that by the time I'm done with you, you will be too.
But first, why should you use spatial audio? Why not just do stereo or maybe 5.1 and call it good and move on with your life? Instead of telling you, let me actually show you. You're going to hear a brief soundscape here and you're going to hear it twice. First, here it is just in stereo.
All right. That was pretty cool. But you could probably tell it was all just sort of, like, stuffed right into the front. So let's do this again. But this time you're going to hear it from all the speakers in the room. So not just in front of you, but also to the sides, behind you and even above you.
Cool. I hope you agree that the second version was probably a lot more immersive and really pulled you into the scene. If not, that would be a problem. But yeah.
All right. And the reason this is, is because spatial audio builds immersion. I'm obviously biased, but I think it's one of the best ways to place the audience into your world.
Now imagine being able to do this on your own projects without any of these fancy speakers, just with a pair of headphones with binaural audio and the power of Apple's spatial audio. You can do that. And it's something that I'm really, really excited about because it finally makes spatial audio accessible to everybody. Creators like you only need a pair of professional headphones for obvious reasons, I'm partial to AirPods Max. Consumers can also use headphones or even just the built in speakers on Apple Vision Pro. All right, so with the sales pitch out of the way now how do we actually do it. Let's take a bird's eye view of our workflow. Our journey begins with acquisition meaning recording audio on set. From there we travel to post specifically sound editorial. This is where you assemble and layer all of your sound elements. And finally, there's the spatial mix, where all the sonic elements come together. So let's start with acquisition.
As Austin mentioned, it really pays off to record Ambisonic spatial audio from the camera's perspective, particularly in documentary style filmmaking. It will capture your entire sonic environment with the right spatial position and perspective, matching the camera that can provide a bed, a sort of spatial glue of sorts that will provide the foundation, on top of which you can then layer additional sonic elements.
If you're familiar with the Apple Immersive Wildlife series, the team actually did a really, really fantastic job recording great spatial audio. If you go back and listen to any of these episodes again, you'll hear these recordings on almost every shot.
Here's production sound mixer Doug shoving a microphone in an elephant's face.
And what may not be readily apparent is that is actually an ambisonic microphone. Later in post, you can still decide to just use one channel if you want to, but that way you have options in post.
Now, when you think of spatial microphones, you may be picturing bulky and sort of weird, strange looking microarrays that are gigantic and that you're going to have to schlep around your set. The good news is that's not the case at all. There's a great variety of good ambisonic microphones on the market, all with a similar form factor to regular shotgun microphones.
And these microphones are really just a sampling of what's currently on the market. So the good news is you really have options. So choose what's best for your needs and budget.
All right great. So we set up our spatial mix. We're capturing everything exactly the way we want to. So we're done with acquisition. Time to move on to sound editorial. Except we're not quite done. Mono recordings still matter. There's a sort of preconceived notion that with spatial audio, you just set up your mics and everything will sound lovely and crisp and exactly how you want it. Unfortunately, that is not entirely true. So the idea here is not for your spatial recordings to replace tried and true mono production sound techniques, but rather to augment them.
So if you have any dialog in your scene, record it as individual close mono sources in addition to spatial. For immersive, that will usually usually mean that you have to prioritize individual mics like the one I have right here, Since it may be hard to hide a boom outside the frame due to the wide field of view.
Between your mono and spatial sources, you now have the best of both worlds. Your mono sources will give you rich definition, clarity and detail, while the spatial recordings will fill in the gaps of mono by providing sonic depth and perspective.
All right, now we're actually done with sound editorial with acquisition, I promise. Let's move on to sound editorial. This is where you assemble and layer all of your dialog, your sound effects, and your music. Conceptually, this step is fairly similar to standard post-production. If you're already doing your picture editorial in resolve, you can stay within the ecosystem and just switch over to the Fairlight tab. Apple's spatial Audio is deeply integrated into Fairlight, so you can do your entire sound edit and mix directly in Fairlight and one shameless plug here. We will have a Fairlight workshop later today, so please join us if you want to learn more.
If you're more familiar with Pro Tools, that's okay too. As deep mentioned, you can also download and use the Apple Spatial Audio plugin suite to edit and mix directly in Pro Tools.
So what does my own sound editorial workflow look like? The first thing I always do when I start sound editorial on a new project is to make sure that I have all my puzzle pieces ready.
What I mean by that is your production team will have hopefully recorded a lot of great sound sources for you, from multiple spatial mics to individual mono ISOs. So make sure you actually compile and assemble all of these sources, because your picture editor may not have used them with everything assembled. Now you can build your soundscape by editing and layering your dialog, your sound effects and music as you do so. remember that immersion is in the details.
In particular, use Foley and Ambiances to build immersion. For those of you not familiar with Foley. Foley are essentially sound effects that are performed and recorded live to picture. They usually focus on the small, everyday sounds that usually would go unnoticed, so things such as a character's footstep, the sound of their clothing as they move, any props they may be interacting with, or even the sound of leaves rustling as an animal emerges out of the forest. So let's take a look at a short clip from another wildlife episode.
You will see and hear the same clip twice. Now you're hearing it right now. This is just production sound.
Okay, so you heard more or less everything is there, but it felt maybe a little bit puny, a little thin, especially remember right now you're watching this in a movie theater, but ideally you'll be watching this in Vision Pro, so you'd be fully immersed. Our hero would be right in front of you, so it feels a little bit thin. Also, if you listen closely, you may have heard Austin actually breathe from right over there. That's something that happens a lot with spatial audio. You get used to that.
So let's listen to it again. But this time with the same production sound, but now also with Foley and Ambiances layered on top of it.
All right. So I hope you agree that this felt a lot fuller. You could actually hear what our main character was doing, and you also felt hopefully again, a lot more in the scene through the use of additional ambiances to sort of pull you into the world.
All right. So now that we've layered our ambiances in fully, we need to mix everything. So it's time for the spatial mix. This is where hopefully all your hard work and sound editorial will really pay off. Here you combine all of your sonic ingredients and mix them together. Most importantly, this is where you spatialize all of your sound sources. So far, our workflow has been fairly similar to 2D, but here we really depart from traditional workflows.
If you're familiar with regular 2D workflows, you know that a lot of your mix will live in a standard channel behind the screen. It's basically the central anchor that ties your sound to the flat picture screen.
In immersive, be careful with that. Remember that the audience is no longer looking at a flat screen in front of you. They are inside your scene. Because of that, they expect spatial cues to work, like in real life. There is no notion of a center channel anymore. Sounds are supposed to emanate from the location of their corresponding visuals.
So instead, anchor all of your sounds to visual elements using Apple's Spatial audio. In practice, that may look like this. I have an excerpt here from the Arctic Surfing Boundless episode. This is probably the kind of frame you're used to seeing. It's just regular lens space. But for this demo, we're actually going to work with a slightly different visual representation. Here's the same frame unwrapped to what we call equirectangular. This allows us to place sounds not just onto the actual picture, but also behind us by playing, by placing it on either side of the frame in those black sections. More on that in our workshop.
So for this demonstration, I pulled out just all the sound effects of the ocean and a little bit of wind. I think those three blue cross hairs you can see there. But each of these crosshairs that you see on screen here is the location of one spatial object. So watch as these objects move in tandem with picture.
So we see even the wind is moving a little bit there on the left. And then for this shot watch what happens right here as the wave passes us and then even goes behind us. Then all the sounds of the waves are converging again. Moving again.
And one more time, every single tiny little crosshair you see, there is one part of the wave that moves along the screen. And then as we get closer, we put a little bit more behind us for even more immersion.
And finally, remember that this is a binaural medium. Your audience will experience your mics either through headphones or through the two speakers on Apple Vision Pro. So make sure that you actually mix on headphones and not just on speakers. It's the only way to make sure that your mix actually translates, not just with regards to perspective, but also dynamic range in the same vein. Make sure to regularly review your work on Apple Vision Pro. It's a theme you're going to hear a lot today. It's the only way to accurately judge distance, perspective, and the positioning of your sounds.
And with that, we've made it to the end of our workflow and our project is done.
So what have we learned? First, make sure you record both mono and spatial on set.
Second, use Foley and Ambiances to build on top of your spatial recordings and to heighten immersion anchor sounds to visuals instead of just plopping them in the center. Mics and headphones review on Apple Vision Pro. And finally, have fun and experiment. Thank you very much.
Awesome. It's funny to see that that classic kind of camera and sound op banter brought up to the stage. So I hope you enjoyed that just as much as I did.
So we've heard a little bit about the workflows from Capture and Audio, but I think one of the most important and perhaps most powerful aspects of Apple immersive video is how it can be built into an ecosystem, one that, like I said earlier, enables you to stay focused on that creative process. And that starts perhaps with building Apple immersive audio into one system that can take you from capture all the way through to your delivery. Here to tell you a little bit more about that and to start this little chain of industry speakers. I'm going to hand you over to Dave from Blackmagic Design.
Thank you. Elliot.
Hello, everybody. My name is David Hoffman, and I'm the business development manager for the Americas at Blackmagic Design. I'm excited to be here today to discuss some of the new and unique tools of immersion media production, and how we at Blackmagic Design have worked closely with Apple to make this new medium as easy to use as possible. Blackmagic design is dedicated to building the highest quality creative tools to be affordable to everyone. Everything we do is from the perspective of supporting the creators in their process. With over 200 products 14 plus categories, Blackmagic Design delivers industry leading tools for creators.
Blackmagic design saw the opportunity to support immersive content creation by providing the next generation of tools that would require. We partnered with Apple to develop a complete end to end production solution, starting with this revolutionary camera and moving through the creative process in post-production pipelines seamlessly with enhancements to our award winning software, DaVinci Resolve Studio.
Our goal is to make it as effortless and easy as possible for creators to get from concept to delivery, all while providing the highest quality content experience to the highest quality immersive viewer. Apple Vision Pro.
The entire process starts with acquisition. The Blackmagic Ursa Immersive is the world's first commercial camera system, engineered specifically for capturing immersive content for Apple Vision Pro.
Based on the Blackmagic Ursa platform, it shares a wide range of sophisticated technology found across all of the Blackmagic Design Urziceni cameras.
It then builds upon these with a range of tools specific to the immersive creation process. We've heard a lot about that in our previous statements and presentations, but now we're going to talk about a few of those things up front here. So the most obvious difference is the 180 degree plus calibrated lenses.
These lenses are calibrated at the factory to establish the relationship between the lenses and the dual 8-K sensors underneath. This begins the creation of the lens space.
This calibration is embedded into the metadata and travels with the file throughout the process, right down to the point of consumption on the Apple Vision Pro. This creates the pixel accurate representation that is immersive media.
While shooting UI tools are given to the creator, this allows for the creator to view the left or the right independently or together while shooting, as well as giving a reference of what the angles are that they're shooting to ingest the media in to resolve studio. You can work directly off the media modules in the camera via Wi-Fi, Bluetooth, or the Ethernet connection on the camera.
The modules can also be inserted into the Blackmagic Design Media Dock. Once the dock. Once the dock, the media modules can be directly mounted as an SMB file, and work can be collaboratively done by all of the connected creators, with no need to copy the media to additional drives.
When Apple Immersive Video Clip is selected, the codec parameter in the header of the metadata panel will indicate that this is an immersive video alongside its native frame rate and resolution.
Additionally, a 3D icon in the lower left corner of the video thumbnail also confirms its status as a 3D clip.
DaVinci Resolve Studio is Blackmagic Design comprehensive post-production software that seamlessly integrates editing, color correction, visual effects, motion graphics, and audio post-production into one complete suite with DaVinci Resolve Studio 20. A wide selection of tools have been added to address the unique requirements of immersive media production, such as the Immersive Viewer for 2D presentation. This allows the creators to look at the media in a 2D platform if they don't have the Vision Pro available to them, to look in context.
But if you do have an Apple Vision Pro available, you can do a live preview as well, directly from several of the of the tools within DaVinci integrated Apple Spatial audio format. Immersive audio mixing is also included.
A video background track support is provided.
All of this preservation of lens metadata all the way through the process.
And of course, a vision OS export preset.
When working with immersive content, editing tools and actions remain mostly the same as when working with conventional 2D.
One aspect to be aware of, though, is quick or sudden movement, which can be disorienting to the viewer, as Ryan talked about in his earlier presentation. Having this data visually available to you allows for you to graph the x, y, and z acceleration and gyroscope position values and objectively demonstrates to you the creator, where camera motion and movement may be too strong.
Editors can use this information to avoid using disorienting sections of footage.
Background refers to the area that the observer will see when they look beyond the parameters of the immersive content. By default, this area is black, but you can insert custom background imagery in a usdz format using the background track.
DaVinci Resolve Color is Hollywood's most advanced color corrector. The full set of tools available to the Immersive Creator.
A couple of the other tools, though, that add to this are using Edge Blend, and the Edge Blend mode is exclusive to the immersive content. It allows you to adjust and soften the perimeter of the immersive video frame. This can be used to hide visible film equipment and to limit the observer, and to limit the observer's field of view.
Beyond this edge, the observer will see what's on the background track.
Another thing is you can quickly now perform rotations and flips in the immersive world pose with a simple click of a control. As we've mentioned several times before, now you're operating really with one file, and the ease of that use is not synchronizing multiple files. So now you can do these these forms or these transforms pretty easily with just one one click of a mouse.
On the fusion page is where you can create cinematic visual effects and motion graphics.
It's built into DaVinci Resolve Studio and features a node based workflow with hundreds of 2D and 3D tools.
Here are just a few of those immersive specific tools in fusion. There's the Immersive Patcher tool. This allows you to temporarily undistort a section of an immersive clip.
This allows for easy tracking, painting, and and compositing on a flat plane.
Once finished, you can re distort the immersive signal and integrate that newly composited element back into your scene.
In the panel map tool in fusion, there's now an immersive option. This will allow you to change from the spherical mapping layout to or from other formats, such as lat long.
Spatial audio is transformative sound technology, As we've just heard, this creates a three dimensional listening experience. It uses advanced algorithms and object based mixing to position individual sounds in a virtual space.
Fairlight allows for precise placement of voices, instruments, and effects, enhancing immersion with head tracking. The soundscape dynamically adjusts the viewer's movements, anchoring audio to the active environment for a more realistic experience.
DaVinci Resolve Studios Apple Spatial Audio format support lets you accurately place sound sources of any channel format in both the horizontal and the vertical planes to create that immersive multi-channel mixes for Apple Vision Pro's video format.
This integration means you can use Fairlight's 2D and 3D spherical panning, as well as Fairlight Ambisonic effects plugins, metering and compatible AU and VST plugins. INS.
And Teletrack uses AI technology to complement DaVinci Resolve Studios. Tracking and stabilization tools in Teletrack can be applied to the Fairlight page in DaVinci Resolve Studio. For audio panning inside your projects, it's possible to stream Apple Vision Pro to to Apple Vision Pro from the Edit Color and the fusion pages.
Blackmagic Immersive Raw Clips Be Raw clips can also be rendered to EXR. As Ryan mentioned earlier, EXR can now allow for you to import and export these projects with both the both eyes of the lens and lens metadata.
In the delivery page, there's two render settings that allow you to render Apple Vision Pro files for review and for final delivery.
There's the Vision Pro review, which creates a small package for review purposes.
And then there's the Vision Pro bundle. This creates a large ProRes file bundle which can be archived or processed in Apple Compressor to prepare it for playback on Apple TV.
Once processed, the final file can be airdropped directly to your target Apple Vision Pro, or it can be placed in the library of your Apple. Immersive Video Utility or iView.
So what's next? Well, a high quality playable encoding solution that is optimized for immersive video will be able will be added to the workflow soon.
This will ensure you can get even higher quality results for your content. Currently in production or your future projects? I want to thank you all for your time, for allowing me today to explore some of the exciting new tools that Blackmagic Design is bringing to the world of immersive media creation. For more information, please visit our website Blackmagic design. Com also, join our worldwide community of users in our forums at forums.
Com. Thank you again and enjoy the rest of your program.
Thanks, Dave. That's awesome. For those of you that know what a 3D printed action camera rig is, I think you'll understand why. I'm pretty excited that we now have a pipeline from a camera through to delivery all in one place.
But like we've seen a couple of times earlier today, reviewing our content from our shoots is really, really important in Vision Pro. And one of the primary ways we'll get there is through dailies. Now, dailies perhaps used to be somewhat of a luxury, but with immersive media production they're turning into something that are pretty essential. So I'm going to hand over now to Brendan from Color Front, and he's going to tell you a little bit about what they've got cooking up Brendan.
Well hello everyone. I'm Brendan, I'm the director of solutions engineering at color front. I'm really excited to be here today. We have a lot to share and even a few announcements, all centered on how Color Front supports creating working in Apple immersive video from dailies to mastering. Our goal is simple to help storytellers push the boundaries of what's possible.
First up, I'll walk you through the powerful tools we built for filmmakers, now expanded and refined specifically for Apple Immersive Video, they're designed to meet the unique challenges and creative demands of this incredible new format.
Next, we'll look at how our technology is shaped by over two decades of real world experience, across everything from traditional film and television to Apple immersive titles, helping top creators bring their stories to life with confidence.
Then we'll look at performance at production scale, how we deliver the speed, the reliability and the flexibility needed to handle rendering across a real world production.
And finally, a new update that will take things just a little bit further.
So let's take a closer look at the software powering these workflows. Because when it comes to Apple, immersive video creators don't just need tools, they need the right tools. Purpose built, field tested, and designed to make the complex feel effortless. All of our software runs natively on macOS, fully optimized for Apple Silicon. Built to deliver top performance on systems like the Mac Studio Ultra.
Let's start with our app onset dailies. This is where the story begins in all filmmaking workflows, including Apple Immersive Video. It's essential to view footage as it's meant to be seen and to make that possible, you need dailies every day. Teams need a fast, trusted way to review what was just captured. Color front onset dailies delivers a streamlined, color accurate workflow. So what you see on set is exactly what your audience will experience in Apple Vision Pro.
Next up is transcoder, our platform for mastering and final delivery. This is where everything comes together, whether it's Apple Immersive Video or high end formats like IMF and ProRes, transcoder gives creators the total confidence that every frame is delivered with absolute precision from the first color decision to the final export. It's built to meet the highest industry standards of quality and performance.
Both of these tools are built on colored fronts, industry leading color science and media processing powered by the color front engine. It's the same foundation trusted by the world's most demanding productions from Hollywood blockbusters to immersive originals, creators rely on our technology because it delivers with consistency and stunning visual fidelity.
And it's not just powerful technology, it's award winning technology. Colorants innovations have been honored with scientific and technical awards from both the Academy of Motion Picture Arts and Sciences and the Television Academy. It's power. It's a powerful validation of the work we've done and the work we will continue to do to support the world's top creators.
It's trusted every Day by Apple TV Studios and other major studios such as Disney and Sony Pictures, along with top post houses like company three and Picture Shop. They power the day to day workflows of the industry's most trusted teams. And if you've seen severance, you've already seen both our tools in action.
Our tools also support Apple's own content creation, powering immersive experiences with the highest quality workflows, and this isn't some future vision. It's happening right now. If you've already seen the Apple Immersive title Metallica, you already know what I'm talking about. And if you haven't really seen it yet, then you really should.
That was the foundation. Now let's show you how it all comes together. Built for the way creators actually want to work.
For over 15 years, we've been on the leading edge of dailies and mastering color workflows, including stereo and immersive, building intuitive tools that help storytellers bring their stories to life. With color front, the technology fades into the background so creators can focus less on the process and more on the storytelling. But what do these tools actually look like, and how do they fit into a real world workflow? Here's a look at our application onset dailies and transcoder. Everything is presented in a simple, intuitive layout, giving creators the tools they need for mastering and dailies. Right in one place. It's designed to be powerful, but not overwhelming. So the focus stays on where it belongs, on the creative work.
Here you can see many of the key tools from onset dailies. We have full support for color workflows including Aces, CDL, custom Luts, and color front engine pipelines, real time color controls for precise adjustments, waveform and analyzer tools for technical validation.
Volume metering for checking your sound sync and levels.
Real time visual playback in the GUI and on the SDI monitor. Out and every clip in a lab. Roll available at a glance. It's everything a creator needs in one simple interface.
As you've learned this week, metadata is absolutely essential to Apple immersive video, and it's a critical part of our workflow. Color front captures, displays and carries forward rich metadata, preserving critical information from set all the way through post every parameter from camera settings, color science to immersive playback attributes travels with the media. That means accuracy and consistency are built in.
Well, here's a bit of a closer look among all the metadata. Notice the Proem data captured directly from the camera and unique to Apple Immersive Video. It carries spatial and playback information, including what is needed for accurate projection in Apple Vision Pro. By exposing and preserving this metadata, we ensure that every step of the workflow has the context of the shot. With Apple Immersive Video, it's not just about the image on screen, it's about keeping the intelligence behind the image intact.
Now let's take let's talk about what it takes to deliver at scale.
We support two key outputs for Apple immersive video the iView for instant playback in Apple Vision Pro. It's fast, immersive and perfect for immediate review and final delivery. And the ProRes Im for offline editorial workflows, giving creators what they need without slowing down the creative process. And with color for distributed rendering, it's easy to create both fully optimized on Apple silicon.
But here's what's really powerful about color rendering doesn't have to happen on a single machine. With distributed rendering, workflows scale seamlessly across multiple systems, whether that's 2 or 10. It's exactly how many of our customers work, including Apple's own productions team. And if you saw the dailies cart in the hands on area yesterday, you've already seen the cart in action.
Rendering is often one of the slowest parts of the workflow. Tying up a single machine for hours with color front, it's different. It all starts with a render queue, where every job and every system work in sync. Rendering is distributed across the network so productions can move faster, scale smarter, and stay in the creative flow.
What does that result in? Up to 300% faster performing performance. Turning hours into minutes, a 70 minute lab rule might take over five hours to process on a single machine, but with four systems running in parallel, it's done in one hour and 15 minutes. That kind of speed does more than just save time. It drives the whole creative process forward.
And one more thing going further, we're not just focused on speed, we're focused on quality.
And today we're excited to announce something new. We're adding the main concept immersive encoding library to our platform.
This brings even greater video quality and performance to Apple immersive video, giving creators the best possible output.
Backed by one of the most trusted names in encoding, this technology is trusted by the industry across cinema and broadcast to deliver world class results every time.
It's built right into our software. It's seamless, integrated and ready to go. There are no extra steps, no added complexity at the core. Efficiency, reliability and precision are built in from day one.
You can review your ProRes Im bundle and encode it for distribution at the highest quality possible using transcoder with the same visual fidelity as the Apple TV app. So whether you're reviewing dailies or delivering for the final master, you can move forward with confidence at every step.
So that's everything we have to share with you today, from the tools to the workflow to a brand new way to deliver immersive content at scale. It's all built with one goal in mind empowering you, the creators.
Oh, not done yet. Almost there. I paused there. We're proud to be a part of the forefront of immersive storytelling. Doing the hard work so you can stay focused on on the creative. And we're excited to be partnering with you to bring the next generation of stories to life. To learn more or get started, just reach out. We've got Immersive Cloud, or you can email us at immersive at and we will get back to you as soon as possible. Thank you everybody. Hope you enjoy the rest of the the show.
That's pretty cool right? Particularly I'm excited about those higher quality encodes. I think they're going to be awesome for folks. Even if you're working on location, to be able to turn around a high quality file, check out if your lighting's good. Maybe there's some hair and makeup things that need adjusted. We'll hear a bit about that later, perhaps. But as we get to the end of our workflow, we recognize that not everyone is a developer. If you're a creator, right? We still have to take our content and distribute it. So here to tell you a little bit more about what they're doing in taking the heavy lifting out of distributing your content as a creator, we're going to invite Zach up to the stage from Spatial Jen to tell you more. Zach.
Thanks, Elliot. Good afternoon everyone. I'm Zachary Handschu, co-founder of spatial. Jen. I'm honored to have been invited to speak today. We're going to talk about how to host and distribute Apple Immersive video. First, we'll have an overview of the spatial platform. We'll talk about different options for uploading, how we distribute videos, and options for playback.
During this event, we've learned about the workflow for Apple immersive video. We've learned how to film, how to edit, and how to export. You have run the production marathon and your last mile is now here. But there are some considerations. The release export from resolve can be three terabytes for a ten minute film, and the format is complex, it's unique, and if you make it through those two, cloud costs are now prohibitive.
This is where spatial gen comes in.
We're designed from the ground up for spatial computing. We handle multi-terabyte files, stream unique video formats, and reduce incredible financial costs.
We help our customers push quality in every way, including support for visually lossless formats like ProRes. The codec and Resolve release exports one indie creator on spatial gen who's here today. Somewhere in the front row. We all know who I'm talking about has over 50TB of footage sitting in their spatial cloud just from bundle releases.
And let's take a look at how you too can make my job harder. Uploading these jobs are free. macOS app allows you to set your ProRes bundle, your Aim file, and a zip of your final cut zip files for upload. This will run as long as it takes to get the video to your spatial cloud. We have support for 2D videos and 3D spatial top bottom, side by side, stereoscopic. And of course why we're all here. Apple Immersive Video. We also support browser based uploads. Simply click the upload button.
Tag your videos characteristics.
Add your audio tracks and your subtitles and we'll handle it from there.
Now, if you're worried about updates to your videos, Don't be. You can edit your audio and your subtitles. Adding new languages, changing them, changing the tracks. Later we will, in real time, push that update to your stream so that there's never any downtime for your customers. When your video is done uploading through the app or browser, you can set your desired resolutions and bit rates or use our auto configured options. We also allow our developers to change resolutions and bit rates in place as well.
Apple immersive video streams are fully automated. Spatial builds a dynamic metadata for your experience with our own internal parser and muxing libraries.
Apmppe streams are automated as well. Just tag your videos on the upload page now. Digital rights management helps prevent piracy by encrypting your streams and preventing screen recordings.
Apple's Fairplay DRM is fully supported on spatial gen. We built our own internal licensing server to keep your tech stack simple and low cost.
Simply toggle DRM when you're creating a stream and you will get Apple Fairplay security.
After creating a stream, you're going to get an HLS link through targeting Apple immersive video and Apple projected media profile. Creators can directly distribute at stellar quality while not having to write their own video format or a custom player.
Spatial HLS links work on your Vision Pro in Safari, on multiple Apple platforms, and in your own apps that you can launch through the App Store for spatial gen developers. Our API enables you to integrate the cloud into any app or website.
The published Streams endpoint allows a creator to test their vision, test their stream many times before. Publish. And when it's finally ready, you click this button and it's live in your app.
Now, spatial is trusted by some household brands and creatives pushing the boundaries of cinema. These are just some of the experiences on the Vision Pro. With the Spatial Workflow Explorer. POV is able to release videos at a regular cadence through our automated pipelines. In Rogue Labs flight site, app spatial is leveraged for enterprise use cases and helps provide pilots first hand experience in flying as they sit in the cockpit over the Hollywood Hills in downtown Los Angeles.
Spatial is also used in defense workflows. Sandwich vision utilizes spatial vision to test their Blackmagic camera inside a CH 53 with captain Don of the United States Marines. We also have a contract with the United States Air Force, and Immersive India has moved to spatial gen to reduce costs and is debuting a set of gorgeous cultural experiences very soon.
I in particular use this scene with the Cobra to help practice pitching investors.
And spatial API powers experiences like Polio's last mile, where people get to be put in the heart of the fight to eradicate polio.
Those were a tiny amount of experiences that I can show you today. There's a bright future with the spatial cloud and we can't wait to reveal more soon. These experiences are made possible because our internal model has been super simple streaming. We want more creators and developers to share their vision with the world, so we search for new ways to maintain the quality of the source image, from rendering to playback within current formats.
We use a variety of quality metrics like VMF, which goes from 0 to 101 hundred being perfect.
I'm very proud to say that Spatial Streams Apple immersive video is what we like to call a straight a provider. So we score over 90 on all major wrongs.
We have great updates in the pipeline that we're currently testing. And our last encoder update pushed our Vmaf score up to 90 on all those wrongs. Even at low bit rates, for example, we went from a 68 with spatial gen V1 to a 91 with spatial gen V2. And this was at 30mbps, a very low bit rate for Apple immersive video.
We are also monitoring the latest releases around Foveation and these developments will be embedded into the platform. I'm sure you guys wanted an example.
So let's recap. We covered how spatial Gen handles those Multi-terabyte files from Resolve's bundle export how we automate difficult stream creation processes. And we discussed easy integrations that will help you reduce cost.
You can sign up for Spasmalgin on our website, but you can also send us an email at com. Or you can just come talk to me after the event. Thank you.
Amazing. Thank you so much Zach, and what an amazing session. It's incredible to see all of these tools coming together, supporting folks from start to end. And a huge thank you also to all of our industry guests who joined us for that one. You know, sharing their experiences, their insights on what they're doing. Now, normally, I'd say the next part of the day is going to be the highlight of the day because it's lunch. But after lunch, we have an amazing session coming up. So before anyone asks, yes, as we go to lunch, Austin and Alex will likely be continuing their sound versus camera debate well into lunch. So if you want to jump into that conversation, you know exactly where to find them. Enjoy your lunch, folks. Thank you.
Okay. Welcome back everybody. I think we have a few folks still trickling in from after lunch. Hope you're all feeling suitably fed and watered.
I'll let you guys just trickle in a few more people now. Awesome. So hope you're feeling a little bit more recharged and ready, because we have some incredible sessions lined up for you this afternoon. As everyone sat comfortably.
Yes. Awesome.
Nice. So we're about to hear from the team behind some of our own popular Apple Immersive titles, and they are going to be lifting the lid on the challenges that they faced, but also the breakthroughs that they made to bring these productions to life.
But to kick us off, we're starting right at the beginning of the production workflow, and I am thrilled to bring to the stage Anton Tammi, director of the very first music video for Apple Immersive Open Hearts. Anton.
Thank you.
Hello everyone. I'm Anton Tammi. Today I'm happy to share the benefits of pre-visualization that I learned when directing the first Apple immersive music video, Open Hearts for the weekend. I've directed music videos for a decade now. Like many in my generation, I started with independent music videos surrounded by like minded colleagues learning directing with international community of filmmakers. We shared knowledge and tips with each other both offline and online. Eventually my portfolio grew and big music labels and artists became familiar with my work. In 2019 and 2020, I directed four music videos for The Weeknd, most well known of them being Blinding Lights.
Then last year, The Weeknd asked me to direct two more videos for him. The first one was dancing in the flames, shot entirely on iPhone 16 Pro and then then the second one. It was the first ever Apple immersive music video for Apple Vision Pro. I was completely new to the immersive format and being the first to ever direct an immersive music video, well, it felt special.
And I knew this time I wouldn't be able to ask help from the filmmaking community in the same way as before.
The immersive format is something so new for us filmmakers.
And after talking to you guys, I. I know that there are people in this room who have never tried an immersive camera yet. Well, I was just like you a year ago, but knowing.
But working with the Apple Immersive Team, I began to realize what an exciting world this is. So how did I start thinking and planning my first immersive film as a director? Well, first we needed a story. Together with the weekend, we wrote a two part concept, a story that symbolized the end of his career through a surreal journey. We envisioned him crashing his car and finding himself in the back of an ambulance, where he would then discover a mysterious door leading to a long corridor. And finally, we would see him in a venue where the film would reach its climax with The Weeknd's final performance.
The first video would be shot on the iPhone 16 Pro, and it would be ending with an ambulance arriving to pick the weekend after the car crash.
Then, the story would continue in the Apple Immersive format, where we would see The weakened inside the ambulance. Now storyboarding a music video shot on iPhone was something I was familiar with, and I'm sure many of you are. But what I didn't know was how do you storyboard and communicate the pre-production for an immersive video? So first of all, I had no idea what would or would not work in immersive format. Does it even make sense to have your actor inside an ambulance when shooting with an immersive camera? Would the immersive camera even fit in the back of an ambulance? How fast could the car move without causing motion discomfort? How do you frame an immersive and where exactly should the camera be positioned? I had no idea, but I knew a classic 2D storyboard wouldn't give me these answers.
And then on top of that, of course, I had an A-list talent to work with. When you're directing with an artist like The Weeknd, everything must be planned perfectly before he steps on set. Sunset.
He's a global superstar. And with him, you just don't get that time to test, shoot and rehearse. I also wondered, how can I explain to him how this video would actually end up looking like, if even I had never directed an immersive video before? So I was facing with two challenges working with a major artist and a completely new format. And I realized my team and I needed a space to explore and test these ideas before we shoot.
We basically needed a virtual environment where I could try easily all my wildest ideas before bringing the weekend on set. We needed an immersive previs workflow.
So what's an immersive previs? Well, a previs or a pre-visualization or an animatic is a CG animated video that represents the final film.
It's like a modern version of a storyboard. Ultimately, it gives you a space to experiment, test ideas, and refine your vision. And I mean, of course, a storyboard is sometimes enough when planning a traditional film. And of course, you don't always need a storyboard, but thanks to the modern tools we have now, we have programs like Blender or Unreal Engine available to create CG visualizations. If you make films that really, really need specific planning.
I especially like blender since it's open source and easy to use. It basically allows you to turn your entire script into a set of immersive animations to get going. You first need. Well, you need a team. I chose to collaborate with the one previous artist called Nick Turner, and together only me and Nick. We did the whole previous together.
Nick had created some custom camera rigs in Unreal Engine, and using these custom set of camera blueprints, we had an access to virtual camera in the CG space, allowing us to frame scenes however we wanted.
And then we were able to export stereoscopic versions of the animation. Every frame was exported simultaneously for a slightly different perspective to match the Apple Immersive Camera.
And after rendering, I could view them in Apple Vision Pro to see how they actually looked like. This workflow allowed me to test all my ideas quickly and affordably, and I'm really happy that we did it, to be honest.
So here, for example, you can see one of our stereoscopic previous exports rendered for Apmp playback where The Weeknd is walking through a corridor.
And when you start rendering immersive privacies, you quickly realize what works differently in immersive compared to the traditional filmmaking.
Since the viewer is watching a 180 degrees half sphere around them. You can't cheat the way you can in classic filmmaking. Everything is visible and fully sharped and the depth of field is super sharp and there's basically no depth of field. That's how I feel it. And the the wide framing reveals every single detail, and you begin to understand how precise and detailed you really have to be. You also learn how close objects are allowed to come to your camera, and how uncomfortable it gets when something actually gets too close.
So the great thing about our previous was that we added this red sphere around our camera, and it defined the minimum distance of comfortable proximity, meaning that it basically showed me what was too close and I could easily turn it on and off to check if my ideal framing was feeling comfortable in Apple Vision Pro. So basically, this allowed me to quickly check if things were technically okay or not.
Or for example, I was thinking at some point that I would be showing the weekend sitting in a sports car, but after running the previous tests, I realized that in immersive, it feels more natural when you can feel like a like a larger size of space around you.
And the tight sports car, well, it just felt small. It made me feel claustrophobic, almost.
And okay, so this one is crazy. I tried rotating 180 degrees rolling camera move, but when I previewed the render, I honestly felt almost like almost fell off my chair.
So if you want try that once in your life. I guess everything should be tried once, but it's it's not the most pleasant experience.
But yeah, I mean, as we kept kept testing, I also started to learn what actually works in immersive.
So let's look at some of these examples.
For example, I learned that the room like space in the back of an ambulance actually works really well in the immersive format as it gives you like a sense of space big enough space bigger than the sports car.
I also learned that corridors work really well. Placing the camera in the center and moving slowly forward behind your talent creates this feeling that you're walking with them.
Or, as Eliot mentioned yesterday in immersive Movement can really be a powerful tool.
Through the previous, I learned that what it actually means and what type of movement really works. So, for example, I discovered that if you position the camera in the center of the road and show the car's point of view with the part of the car visible in frame, you can actually capture very fast movement.
But how do you.
Yeah. Like how do you get like you kind of like, need to get grounded by something like the hood of the car here. Like it's something that you can always find in a shot.
If you can like, like a thing that helps you to feel like you are. You are. You are somewhere just not so that you're floating. So all this, all this testing taught me, basically that I can do these things that I'm used to do, like fast shots. I did blinding lights. It has a lot of fast car shots. And I was wondering if this video could have those. But I learned with these these ways they actually can.
And what I love about this workflow is that it adds a new phase of craft to filmmaking, and it can often lead even into new ideas that you might not have without it. So, for example, in this shot, I knew I wanted a POV of the ambulance driving along on an endless straight road. But when we were working on this previs and I saw this shot, I started to wonder, like, what if the road would bend upward as you drove through it? How would it feel in immersive since you have like more space above you? So we tried. And this is how this complex VFX shot was already proven to work in Vision Pro before we went to shoot.
And then when we went on set, my VFX supervisor and I, we knew exactly what kind of plates we needed in order to make something this ambitious possible in Apple Vision Pro.
So here you can see the final shot in 2D.
The previous basically allowed me to experiment and test even the wildest ideas freely and affordably, keeping only the ones I knew for sure would work.
So what do you do after finishing the previous then? First of all, I was able to show the full film to The Weeknd himself. Here are his comments and get his approval.
Secondly, the previous helped me to educate my team. I showed the previous to my cinematographer Erik Henriksen, and he had he had already worked with me and the weekend before, but neither him nor his team had ever shot in the immersive format.
So after seeing the previous, Eric shared the information with his team.
So basically, at the core of all our pre-production was the previous a pre-made stereoscopic animation that everyone we brought in could watch in Apple Vision Pro and kind of see the film before it was made. That's how everyone knew how to design the sets, how to light the scenes, and how to control the camera movement.
And after completing this workflow and having now directed an immersive film, I can I can tell you this and we can talk about it after this speech. If we see outside, it is not that difficult or necessarily not much more expensive or even that different compared to directing regular films. But and I think now that I'm talking to you on this stage and everything, I sound like a film school teacher in Finland who always, like, gave me these things. What you're supposed to do. But I guess I'm becoming one now. When I say that you do have to prepare more carefully, and you have to spend more time in pre-production, you can't just go run a gun.
So an immersive previs helped me to test different ideas in a cost efficient way.
It served as the perfect communication tool for the whole team, allowing me to show something that was impossible to explain with words. It made sure that everyone from my talent to technical crew and myself were all on the same page. And most importantly, since this was a music video for the weekend, it allowed me to be fully prepared and confident when he walked on set.
But yeah, what happens then when you move beyond the pre-production into filming? What needs to be done differently on set when you shoot immersive films? Well, here to share more about working with these new technologies is Alex.
Hi everyone. Hi everyone. Streaming at home.
I am Alex and I am one of our producers working across many of our immersive projects, and one of the other exciting pieces we released last year was our first scripted project, submerged. Let's take a quick look at what we made.
Everything we're doing here is the first time that anyone's ever doing it.
When you put on the Apple Vision Pro, it does change the way you think about creating a story.
It's a wonderful new medium that just expands the horizon of storytelling.
Because you're not watching a movie anymore.
You're inside the story.
It's going to change the future of filmmaking.
All right, well, that was super exciting.
As you just saw, submerged brings viewers into a World War Two submarine and follows its crew as they struggle to combat a frightening attack. We filmed this 17 minute piece in three international locations with just over nine shoot days. And as Edward Berger, our incredible director, mentioned in the previous clip, there were a lot of firsts on this production. Not only was it our first scripted piece, but it was the first time we collaborated with a large, critically acclaimed film crew. It was the first time we deployed multiple Vision Pro devices on set to be used by tools, as by the production team.
It was our first time filming large special effects stunts and successfully filming in low light and really tight spaces, and we often get asked the question, what's different? How is this production different than traditional 2D filmmaking? And the answer is there's not as much of a difference as you would expect. So this is the production crew in Brussels. They're working on the sets that would be fully submerged in a water tank, which is pretty epic. And we learned with this production you can have traditional crew positions like your art department, your camera operators, your gaffers. You can have similar budgets in filming schedules. But because of this new technology, what's different is really the how as a producer, how are you allocating your resources and how are you building in the right amount of time to execute a successful production. So today we're going to take a look at a few different areas of production where this new technology changes how you spend your time and your resources. And we're going to begin with the development and pre-production process. But first, let's quickly go over what a traditional production process looks like. So we usually start with the creative development first emerged. This is where the script writing began. Then we moved into pre-production. We found our locations in Prague, in Brussels, in Malta. We hired our crews. We began building our submarine. We casted the film and we started camera testing. Then we moved into production. The team flew overseas and they hunkered down for the shoot. And look, this is really where everything came together. We saw the finished set that would get flooded with water. We saw the actors in their World War two costumes, the fight choreography. It was really exciting. And then we hit post-production where we put the footage together in the edit. We had visual effects teams building the outside of the submarine, and we added unnerving sound design elements like the slow drip of a leaky pipe. And then finally we hit delivery where the piece landed on platform. But it's really here in the development and the pre-production phases where we've learned we need to spend more time and more resources. We've learned we need to account for plenty of time to get our crew up to speed and comfortable with this new tech, which meant camera workshops and lots of camera testing like you see here.
We also conducted a lot of demos to make sure our crew spent lots of time in the Vision Pro before production, so everyone felt confident using it. Because when you're on set and you're in a fast paced filming environment, not understanding the basics of navigating around a Vision Pro to review footage can eat up precious amounts of time during the day. Another big part of that development phase is previs. Like Anton spoke about, previs Became crucial for us producers so we could understand what filming tools we would need to use based on the camera positions and the blocking. But also, Previz helped us figure out how we were going to build our environment. And that leads us to the second area we realized was greatly impacted by bringing this new technology into the production process, and that's set design and lighting. Building a period accurate, accurate World War Two submarine from scratch was no small feat, and the production's incredible art department, they worked around the clock to bring this vision to life. And without previz and extensive camera testing like you see here. So this was a mock up of the submarine corridor. We wouldn't have realized that the actual sub design, even though it was period accurate in size, was actually too small for our immersive camera and could potentially cause viewing discomfort because of the proximity to the walls on either side. We also learned we had to pay extreme attention to the detail in the set design, because the immersive camera, it mimics the human eyes, its acuity, or the sharpness of the image would pick up any inauthentic detail. So, for instance, the art department, they had to source and create all of the dials around the submarines with the right materials. And trust me, there were a lot of dials you couldn't get away with just using wood or plastic. And so for the producers, that meant that we had to build in more time and allocate more resources to set design. Inauthentic details and the immersive camera's wide field of view also affected how we lit the film. So we built as much lighting into the set as possible. Not only did this help with the creative vision so we could maintain the authenticity of the sub design, but for us producers, it also meant that we could save some costs on the back end so we wouldn't have to do as much visual visual effects cleanup, because anything that sits inside of your 180 degree field of view that isn't part of the set will need to be removed in post. And we learned that can get extremely costly very quickly because of our high frame rate and resolution. And while we filmed most of our scenes inside a stage, we did have to go on location for the final open water scene. So that is the immersive camera in a splash housing, and that protects it from the water. And getting to that final shoot day was extremely challenging. We faced a lot of pressure to get that location right. For instance, we had to be extra mindful of the weather because we wanted to keep a stable level horizon to avoid motion discomfort in the Vision Pro. So waves, they were not our friend, and we also needed a location with a fully unobstructed field of view. So we had to pay extra attention to what else was out in the water. Our teams. They pored over shipping routes to ensure that we wouldn't have a lot of traffic from other vessels. These are just some of the many, many considerations we had to take into account when producing this shoot. There were also many learnings for actors on screen that leads us to our next area impacted. The talent on screen presentation, like our set design because of the acuity of our cameras and when the world scale is 1 to 1. Details matter. Elements like wardrobe and makeup and actor performance. They had to be authentic. For instance, some of the makeup didn't hold up to our camera tests. When we transferred our tattoos over to our actors, they didn't feel real enough and immersive, which was not something we were expecting. But okay, so this brings me to a really important. Sorry. This brings me to a really important difference between regular 2D production and an Apple immersive production. And that's how we're reviewing footage. So if you saw the original tattoos on a regular 2D monitor, you wouldn't think anything of it. But reviewing footage in Apple Vision Pro is the only way to truly assess your shots when you're in production. Traditionally, the review process on a 2D production is fairly straightforward. After a take, you have your playback operator playback footage directly to your monitor. You watch your dailies the next morning. It's a pretty well oiled machine on an immersive production. While you can watch your playback flat, it doesn't give you the most accurate representation of your footage. So we discovered it was necessary to have many Vision Pro devices in circulation, and we also needed to ensure everyone knew how to use a Vision Pro, so watching footage wouldn't delay the production day. We built platforms for everyone's Vision Pro and like you would with any other device, we had to keep them clean and charged. So this taught us that it was actually crucial to have one person dedicated to keeping the Vision Pro devices running a Vision Pro Manager, and this was something we also needed to account for. I know you have to have that though, I promise you. So this is something we also had to account for in the daily schedule, this whole process, it does take a little bit longer. Which leads me to our last learning, another major technological element that also impacted our production schedule the use of additional camera systems in submerged. We had a number of shots that were not at a 1 to 1 world scale. They were much closer and they had a shallow depth of field like you see here. These shots were filmed on a completely different camera system. It was a traditional cinematic 3D rig, and we learned that we could not seamlessly integrate this camera system into our production workflow. It had a separate team. Daly's process and other implications in post-production. So this is another reason why early camera testing early, early camera testing is so important, so you can understand the needs of any additional camera units and how they'll integrate into your immersive production. So if there is anything I can leave you with when it comes to new technology and production expectations, spend enough time preparing yourselves. Be open minded while learning how to integrate this technology into your productions. And lastly, get excited. Okay, how often is it that we essentially get to go back to film school and learn something so new? And through all of our collective experiences, we can help each other produce successful immersive projects.
And now let's step out. Step out of the scripted world and into a thrilling live arena. It is my pleasure to introduce Ivan, who is going to walk you through some lessons in accessing and capturing live events. Thank you.
Awesome. If you haven't had a chance to watch it, watch it. Fantastic film.
Thanks, Alex. My name is Ivan Serrano. I'm the technical director for Apple's immersive media team. Today we're going to talk about all the lessons learned on capturing immersive live events for VOD or live to tape. Not to be confused with the live live, which I know everybody's excited about.
We all love going to live events. There's something exciting about being in the crowd, whether it's watching Steph Curry dropping threes from half court at the All Star game, or LeBron James at 40 years old, throwing down monster dunks. These moments stay with us. They're not just entertainment. They're shared experiences.
Live sports are some of the most immersive experiences we share. The tension, the rhythm, the unpredictability. It's all unfolded in real time. And just as the playoffs are ramping up, we're so emotionally invested. Whether we're in the stadium or watching from home with VIP Yankees, we had a rare opportunity to shoot immersive by placing a camera inside the dugout not just to film the players, but to capture the pulse of the game from their perspective.
Now, if sports gave us a taste of proximity, and what we're going to talk about today is Metallica, which handed us the full spectrum, proximity and immersion.
We weren't just close to the action, we were inside it. We had access to the tuning room, backstage pyro cues, crew comps, and even fan POVs.
It wasn't just about getting great angles. It was about capturing the heartbeat of the show.
This project taught us how to film, not what just is happening, but what it feels like to be part of it. That's the essence of immersive capture. But what does success really mean in this format? Putting a camera closer than the front row. Being just in front of Lars and capturing presence and momentum.
Or James walking to the stage just feet from the fans. Capture from within the barricade, shoulder to shoulder with security.
Or on stage with Kirk jamming to Master of Puppets. You're not watching the band. You're with them. Every stomp, every solo. This is what it feels like to be on stage.
But it all wasn't perfect in the beginning.
This is where we started. I'm sure everybody knows what these are, and most have seen these types of cameras at every sporting event or concert. It's a long lens where you can be 100ft away and still get that nice close up using a 20 by or 100 by lens, or this one. Almost any time you walk into a theater, this is where they place you in the camera pit, out of the way again, hoping you have that super long lens.
When we first started filming immersive events and after multiple shoots, most of the event management just simply put us next to those cameras. And of course, we learn that it just doesn't work.
But really, what are the most important things we learn to deliver? An incredible immersive experience is the access. And that's where the real challenge begins getting cameras as close to the action as physically possible. And regardless of its sports or concert, prime locations are tough to get. You're negotiating with broadcast crews, navigating through security zones, and working around legacy workflows like Metallica's.
So how do you break through it? And that's the groundwork.
And it started with Apple's creative team traveling to meet the band face to face, sharing demos, concepts and ideas for how they could deliver a whole new experience to their fans.
And after watching some of Apple's immersive videos, they immediately saw the potential.
This band has been innovating for decades. That shared drive to collapse the wall between artist and fan, becoming the foundation for this ambitious, immersive film.
Their access shaped how we designed coverage. Every camera placed within with intent every angle mapped to their movement.
But before we talk camera placement, we had to discover the stage layout.
This wasn't just a stage. It was a circular blueprint for immersion.
The snake pit at the center. Fans wrapped around the band. Energy moving in every direction as we scouted through their concert in Minnesota. Their team walked us through their entire stage design, provided us drawings of the stage, and even walked us through where they placed cameras for their 2D capture. We weren't just looking for good angles, we were learning the architecture of the experience. With that level of access, we could place cameras in zones that are usually off limits, but to make those placements count, we had to understand how the band moves. The team watched over 30 performances from the tour to map the band's movement. Show after show.
They even gave us ISO feeds of cameras like the Spidercam to understand to best position our aerial camera. This increased our chances of knowing exactly where the magic would happen.
We then brought these models into Vision Pro, allowing us to preview every camera angle in immersive, where we could test everything from camera speed and positioning to feeling the physical effects of movement.
And here is the camera plot.
This layout shows 14 cameras positioned around the stage. Each one was chosen for a reason. Proximity to key band members, fan energy zones and movement paths.
We mapped the stage very strategically, anticipating solos, crowd surges, and moments of intimacy. Some cameras were fixed, one flying and others in remote dollies.
And with all the access and Intel we got from this band, this is what it looks like sitting next to the director, Mark Ritchie.
Even though the the 14 camera shoot was VOD, it was live directed in real time as the show unfolded. This was the largest multicam concert setup the team has ever done.
And to bring this experience to life. Not only did we need access, we needed the right equipment that could move with the moment.
Knowing the nature of rock and roll, all of our cameras were stabilized, including this new Steadicam by Tiffen using the volt three system.
This is also the camera that followed James as he entered the stage right through the heart of 60,000 fans in Mexico City, giving the viewers the feeling of walking with them.
And this rig right here that put us above and beyond the stage, the spidercam giving us unique perspectives like this sweeping aerials, dynamic transitions and angles that traditional rigs simply can't reach. This wasn't just coverage, it was choreography between gear and performance. Every tool was chosen to match the energy of the environment and the emotion of the moment.
So to summarize, to deliver a truly immersive capture, it starts with access because without it, you're just guessing. From there, map the stage, study the choreography, and if possible, build your previs in 3D to lock in your camera plan and finally match your gear to the environment, because the right equipment lets you focus on the moment and not worry whether your footage will survive. Post.
As a result, immersive mediums don't just tell a story.
They let you live it. And that's louder than words. And for those who haven't seen the film yet, here's a sneak peek.
Are you alive? Take my head off to never, Never Land. Ha ha! Thank you everybody.
And now to. Speak. About unscripted storytelling and less controlled environments and more unpredictable productions. It is my pleasure to introduce to our creative director, Frans.
Thank you Ivan.
For those of you who haven't seen a metallica piece he mentioned, it's pretty epic. Check it out on the TV app. Now, you may all be wondering, is her name really Franz? Yep. Sure is. Friends from France.
But the real question is what happens when you're not on stage with musicians? Not on a Hollywood film set or not able to use previews.
Some of you here may be more familiar to Lina Cruz. Fewer cameras and a smaller set of resources. This is definitely the case of our team that's crafting unscripted stories for Apple immersive videos. Typically, our sets look like this. We're out in the wild, oftentimes quite literally, and we film in less controlled environments.
We straddle the border of producing and just letting things happen. And this is a particularly delicate balance in immersive. Here are some of the immersive documentaries that we've produced. Our thinking was let's transport people places. Allow them to come face to face with some amazing creatures, and embark on unique adventures so that we would allow each viewer to become a visitor. And we were hoping that we would craft films not to be watched, but to be experienced.
But truth is, being there is not always enough. And our team has made some pretty amazing discoveries along the way, which I'd love to share with you today in hopes that it gives you a head start when it comes to crafting your own immersive stories. So we'll share some creative insight and try to root them in concrete production situation. But before we dive in, please remember it's a new medium. There is no established cinematic language, just some considerations that you might want to keep in mind before you hit the field.
The first time we did so, let's say it like this. We traveled pretty light. Our main tool was a tripod, which, by the way, there's a fun fact about Tripod in Immersive. It can easily make a cameo. Why? Because Apple Immersive is an amazing format that mimics a human eyes, so we pretty much capture everything from nine till three, but also a bit higher and lower than 12 to 6. So yeah, just make sure no leg will face the lens. Put it toward the battery pack.
But the good news is, even if you only have this grip tool, you can still capture beautiful Apple immersive stories. Because in immersive, there is power and beauty in simplicity.
And what could be considered a simpler kind of static shot in 2D linear will not totally feel static in the viewer's experience because we filmed the world the way we see the world. And in fact, the camera is a viewer. Don't you think it looks kind of adorable with its two lenses? It's like it has two eyes. I love to say it looks like a cuter Wall-E.
And while it may doesn't have tools like interchangeable lenses or rack focus your viewers. They're the one we're going to be tilting, panning, or rack focusing on their own. Simply doing so by moving their heads and their eyes.
And certain shots are particularly powerful in immersive like this one.
The feeling of your subjects like this elephant here coming towards you, or even just moving around you, is incredible.
And in fact, one of the most amazing things that we've heard from some folks after the experience series, like wildlife, is that they don't recall them as films that they watched. They remember them as memories.
Let's pull back the curtain on this shot. Well, there is beauty in simplicity. That doesn't mean capturing this shot is so simple because the camera is a viewer. You want to be intentional about where you're placing the viewer and therefore where you're placing. You got it. Your camera.
While there is no set rule, we can share some guidance when it comes to camera distance. About six feet away from your subject would be considered a comfortable distance and four feet away if you really want to feel close to your subject.
See here. Our character is about to take her first dive with sharks. There's so much tension. You want to be close, but you don't want to be in her face either. Also, not so close that you can't see the sharks right here in the background.
So really in immersive, it's not just about getting a good shot, it's about what you filmmakers want your viewer to see and what you want your viewer to feel.
And camera height contributes a lot to that feeling. Filling. Typically, the height we often use on tripod is about 50 to 60in about collarbone height of most humans. But let's look at this shot here. How high do you want your viewer to be? Do you want them to feel like they're sinking in the deep snow and bracing this extreme conditions, or do you just want them to be at the exact height as one of your characters? These choices give you filmmakers agency to decide and to guide your viewers experience.
And while we don't have these interchangeable lenses in zoom, we can physically move our camera and decide on their position in relation to the subject and to the action. And at times, we even got a little bit too close. And it worked for a brief moment. In fact, that shot with the Rhinos broke all rules, but it turned out to be a fan favorite throughout our demos. Proof that I guess rules are meant to be broken, but with intention.
While there is a risk for some viewers to go cross-eyed, there is also a bigger risk for all viewers to kind of just reach their arms out and want to pet this chubby unicorns. So yeah, it's always a trade off.
Same goes for camera framing. We film in uncontrolled environments, and sometimes things are just there without any ability for us to move them, so we just need to work around them. We filmed in a hot air balloon, and its basket structure was right on the edges of our frame. There is a risk of breaking stereo, and the basket structure will likely feel like it's sitting in your personal space. You can mitigate that by staying on the shot for a very brief moment, or making sure that your viewer's eyes may be attracted to something else, such as the face of the pilot or the fire of the burner.
Now, eventually, we experienced some camera movement. We had to experiment a lot, but we did so with intent. Usually we aim to keep it organic just simply to how we move as humans, unless we want to provoke a very unique reaction.
In immersive, we simply can't go handheld. But even with linear resources, they're often efficient ways that you can mount your cameras on cars and cranes.
Movement helped us reinforce the feeling of presence. But we had to be mindful of the speed at which we were moving, of the direction at which we were going, and the stability of our movement.
We even took our camera up in the air. We mounted it on drones or helicopter like this one.
We also played with added slight up and down movement, filming from a bird's eye view or simply hovering over a scene.
But what? Life doesn't totally love the sound of blade.
And we know the power of being close to animals.
So our team crafted a cablecam rig to follow a young orangutan following their mothers up a tree.
It took a lot of iterations and testing to define the right speed and ensure smooth movements. But it's one of the most powerful shots we've captured.
One take shots are so powerful and immersive, the viewer is just there, following the scene and experiencing something they would likely never be able to witness in real life. And while we certainly can't direct wildlife, we can help guide the gaze of our viewers, particularly in uncontrolled environment. In immersive, it's often less about directing your on screen subject and more about directing the viewer.
There is a lot of ways that you can direct your viewers experience without having to dictate it first by motivating what they see.
While there is a lot for your viewers to see in 180, this is an example of a clear element that your viewer will likely look at.
Some assumptions from 2D linear filmmaking don't totally translate into immersive video format. For instance, some of you may choose to follow the rule of thirds like I did this morning when I snapped this pic of the Apple Park coyote on my way to work.
You're placing your subject at the intersection of these two horizontal and two vertical lines. But in immersive there is something else. There is a whole new dimension.
For example, here you would think this is a shot of a man drinking coffee.
But to many in immersive, it may actually come across as a shot of a plane sipping water, I guess.
Point is, in immersive, we deal with a whole new depth. There is a foreground, a mid-ground and a background, and usually the best way for you to call attention to where you want your viewer's eyes is to do so by crafting what they hear A super power of Apple. Immersive video is spatial audio, something that's really unique to our format, and it opens up creative storytelling opportunities.
Just listen to this.
Every sound is localized at the perfect distance of what you see. The waves, the wind, the seagulls. Sound is crucial in immersive storytelling. Not only does it make you feel there, but it makes you look there at specific elements, which we can then use as cutting points to create organic transitions and a highly cinematic storytelling.
Pristine silence was also pretty epic in the balloon, only punctuated by the outburst of the fires from the burner.
Audio truly adds up to the dynamic of immersive experiences, but it's also a fantastic way to create an intimate scene, like this moment where the caregiver whistled to this elephant, or when he decided to explain something to the viewers, pointing at a specific feature of an elephant. It's a real moment of authenticity and the kind of experience you only get to experience in immersive.
But not all our experiences were character driven. In fact, for elevated series, we don't have a character at all. We basically step inside an Apple screensaver and we fly over the most majestic vistas.
For some of the episodes, we had to write narration directly, calling attention to certain elements in the frame so we could continue to immerse the viewer, which is my last and maybe most important creative insight. We always hope that the viewer will ultimately forget about the technology and truly immerse in the story, even if that's quite literal, like when we went underwater with sharks or when an orangutan. Get ready. Poke the eye of the viewer.
Immersive is an experience, and what or who you decide to put in front of the camera will ensure that the experience is not just transporting, but also transforming.
You're setting matters. The place where you're filming the time of the day, or really just a shot that simply reminds you of the way you would experience a new country. Overall, it's about creating the right atmosphere for your story and your character becomes guide. Their casting is key. Most of the time we fill them full body so every detail can be noticed. If they feel a bit nervous, the foot twitching that may become all your viewers will stare at. So you must establish a strong relationship of trust to have characters comfortable and compelling on camera.
You also don't want to have the contrary with an overconfident talent like this overeager salesperson stuck in front of your face, constantly interacting with you.
So we tend to direct them less than in traditional 2D documentary, where we cut between camera angles and edit sequences with close ups. Characters will be on screen for longer. Each shot kind of tends to last longer, so you really want them to be people you want to hang with, even if they're about to surf in frigid water. Here we're experimenting with filming portrait, which felt like a different creative choice in immersive. We had a clear intent, so a lot rests on your character in immersive. There are guides for the audience. They're experts, and they need to pick our curiosity. You want the right person and the right voice, but at times you don't even need their voice at all. Because in immersive, we connect to people in a unique way and we share more than a space together. We share moments like this free diver here about to break a world record under the ice in traditional and traditional storytelling. We often speak in acts and scenes, but some things that unique to immersive or moment moments to immerse where we lean on not forcing anything, just letting emotions run. Moments that viewers can simply soak in and take their time to fully appreciate. That one was all about tension and silence, but we can also be fun and immersive, which we did in our film. We let them interact, stared directly into the eyes of the viewers, and also just letting them be orangutans. And if you watch the film, there is a section that demonstrates their playful nature. That moment was found in post While immersive, requires all of us to have this robust pre-production plan and to really try to anticipate as much as possible. We love to keep room for creativity in post.
Actually, there is no real fix it in post in immersive. There is not even third party stock footage. Doesn't exist as of now, but we can leave the door open to finding new creative, immersive ways to tell stories in post, because that's very often where it all comes together. And on that note, it is my pleasure to introduce you to Tim Amick, our head of post-production, to go over lessons in presence and post-production. Thank you.
Hi everyone. Thank you Franz.
I'm Tim Amick. I run post-production for the immersive media team, and I represent a team of highly technical, creative folks in post that contribute to all of the Apple Immersive videos that are on platform today. When I talk to people about immersive post-production, they often think the post process must be an unrecognizable workflow that's suitable for engineers only. And the reality is, the phases of post are quite similar to a traditional film or television project. The closest analog would probably be a giant screen 3D film, but with the introduction of the new Blackmagic workflow, it moves a lot of the technical considerations around 3D under the hood and lets you focus on creative.
So while the stages of post may be familiar, there are definitely some major differences that we consider both creatively and technically, and most of them center around the sense of presence. So for us, really, the important part of presence in in what Apple Immersive Video brings is that the viewer feels like they're really there. And so in post, it's our job to protect that. And so when we're when we get the footage, there's a hundred things we could do. Creative and technical, that either strengthen or weaken that sense of presence. So when we protect that, we want to make sure that anytime we're changing that sense of presence and diminishing it, it's for creative intent and it's being done on purpose. And really, those efforts begin as soon as the footage hits post. So when you bring in your footage, you're reviewing dailies in Vision Pro, which is crucial to identify the best moments from the perspective of the viewer, and to get ahead of anything from the capture that could interfere with your sense of presence. You can do those reviews now directly and resolve the resolve. Live preview to Vision Pro can work with raw media, but we use a dailies workflow for speed and accessibility across systems and for lighter weight operations during offline editorial. So while the Blackmagic camera is certainly the best immersive camera I've ever worked with, just like any production, there's a lot of unintended outcomes that can lead to issues with what you captured. When the goal is for the viewer to feel like they're really there. Fixing or avoiding these elements in post is critical to preserving presence.
The good news is, though, that the immersive video enabled cameras like the Blackmagic Blackmagic Camera take care of some of the most common issues that you'd be typically hunting down and correcting if you worked in immersive before, like bad 3D sync, incorrect projections, and inaccurate world scale. Again, it really lets you focus on the creative first. It simplifies the media workflow as well, keeping camera specific metadata attached to each shot and eliminating most of the needs for transcodes with different projections, which again keeps you focused on creative considerations for people working that have been working in Immersive Post for a long time, there's so many improvements moving in this direction now. When you're reviewing your shots, 180 degrees is a big canvas where many small issues can hide. And so we review both on the monitor and in Vision Pro looking for production gear in the shot. Arrant crew members really anything they can get in the way of that sense of presence. Also lens flares. Interestingly enough, lens flares. It's especially important. And why I say Vision Pro review is so important. If they're in both eyes, it can actually feel comfortable and interesting and real. Even though you may not have seen it, if you're really there, it feels natural. So for us, it's really important to see it on the screen, but also review it in Vision Pro. And that's true for any any issue that you see. It's got to be assessed in Vision Pro. Anything that would that would distract from the reality of the experience we're trying to craft. And the feeling of really being there gets noted early on.
So once we move into editorial and immersive, there are lots of special considerations, but one of the most unique ones that really shifts the way you think about things is that unlike film or television, the audience never sees the whole frame at once. They have the agency to look where they want. So because the experience is more like being present in the moment than watching something on a screen. So that means editorially, every cut is really like transporting the viewer rather than just cutting to a new shot. People will want to know where they are and why they're there. So when you're taking in these complex environments that we're taking you to character actions, adding narration on top, you can definitely overload your viewer with information and they will essentially decide what to ignore. If you're not careful with your timing and pacing of how you reveal information and take people into new shots, it might be a key story point that they miss. So really, you got to avoid that information overload and assess that for how the viewer is going to take it in when it's their first time being in these shots. So really, you're creating the illusion of freedom. The audience can look anywhere, but you're directing their attention and making them feel like they've discovered something on their own, but you're really guiding them where to look for maximum effect. So that's creatively. But also we consider the viewer's comfort when we're the viewer's comfort when we're directing their attention. So our editors are thinking about in Z space, like when they're cutting, where how far into the shot are you looking in each shot? So you might be looking off in the jungle, and then all of a sudden there's an orangutan right in your face. A hard cut between those two shots can be disorienting at best, and at worst, you'll go cross-eyed. So lots of ways to contend with this issue, but ultimately, one of the simple ones we use is a dip to black, commonly used in video, but usually means something specific. But in immersive it can be more like a blink where you close your eyes and you're opening them again. It resets people's depth cues. It also leads people back towards the center. If you had them looking off to the side and you need them looking in the center, it can be useful for that. So a lot of these techniques we've learned over the years are really to take film techniques that we know and realize that they apply differently in this format, and really think through what the audience experience is going to be when they feel present. So another huge aspect that you heard a bit from Alex Weiss earlier on our team is, is that's building a convincing sense of presence is the approach to spatial audio. So in life, everything around us is making sounds all the time, whether it's important or not. And in reality, our brain does the job of emphasizing or raising the level of what's important to us and ignoring the things that are less important. But it's all making sounds in immersive. That's our mixer's job. So our sound team works very hard to ensure that everything you expect to hear in a complex scene like this is audible. It has to be heard. It's even if it's not the focus. It sounds obvious, but you'd be shocked how different this feels when compared to film or television, and how much you can get away with not making a sound in a film like those and an immersive, It really improves your sense of presence in that moment when you get it right and everything you expect to hear, you hear.
So in a scene like this, the the elephant splashing around in a mud ambisonic capture will get you most of the way there. But as Alex mentioned, it often needs augmenting with foley and sound design. If any visual draws your eye, your audience will expect to hear it the way they expect to hear it, not necessarily the way it sounded. So just like if you were there, you want to assess what the audience's experience should be. Another big difference, sound wise, is that B-roll, a cutaway. You know, someone's talking about surfing and what it was like to be out there in traditional. You wouldn't necessarily hear that B-roll because someone's talking over it, but now it's a place you brought your viewer to their present in that scene. If they don't hear what's happening, even in that cutaway, they're going to feel that that feels odd. It almost feels like somebody covered your ears when you're in this sense, this shot, and you can't hear what's happening. So the absence of that sound can be really distracting and take people out of it. Also, in a shot like this, the accuracy of where you track your objects to really matters. A mismatch of a speaker's voice to their mouth, or the crashing waves and the movement of the character moving across the frame. If that mismatch happens, it can really throw people off. Even folks that aren't technical will just know something's not right. So it takes away from that sense of presence and reality.
Visual effects is another area where special attention must be paid to build on our audience's sense of reality. Really an immersive. Anything's possible with time and budget, but you're either trying to remove things that were there, cleaning up issues from production, or adding things that weren't actually there, like title graphics or CG elements. In the latter case, it's really important to consider that even things that aren't real need to adhere to the sense of presence created by the immersive format. So what do I mean by that? So if you have a title graphic, it still exists in this 3D space. There's no such thing really, as an overlay, because everything occupies a place in the world. So when you shoot a shot like this, you actually have to place the titles in an open spot in the canyon. You can't overlay it in the middle of the rock because people will go cross-eyed or just feel that that doesn't look right. So you need to plan for it in production or find an open spot for it in post.
Also, you know something like a title graphic tricks that are used in design like that would work on a flat screen, like 2D bevels or drop shadows or texture that would normally create fake depth will be really noticeable once you're in a real 3D space, because you ultimately will see that it's a flat texture that's made to look like it has depth. And so you really need to design thinking for either creating the real depth of that object which you can do, or avoiding a trick like that altogether, and letting it be flat because people will see that it is flat. Additionally, the light of a scene really matters. For this, you can decide is the is the title going to take on the light of an environment and feel like it's really there. That's an option. Or you can ignore that, and you can just make it feel like an overlay, which can work as well. But both need to be planned for in the visual effects process.
Also, with full scale 3D CG, like our Prehistoric Planet Immersive series, the level of scrutiny for realism is higher than ever because the user feels like they're really there. So the light, the movement of the characters and the environments around the physics of objects and your camera movements all need to feel real to maintain that sense of presence. So anything that would seem odd if you were really standing there will stand out to the viewer, even to folks that aren't technical. That viewer scrutiny will carry over into the finishing process. So online color and finishing final finishing touches really make sure the last mile to a seamless sense of presence for your viewer isn't distracted by image noise, compression artifacts, or other technical imperfections. We wouldn't expect to see if we were where the camera was. Us. As I mentioned earlier, any production can have issues somewhere along the way, and the finishing process is our last chance to kind of address anything that's still distracting from that sense of presence. So even Well-shot images in this format need a denoise pass. This is really important because the noise is number one different in each eye. So it causes issues with the viewer's sense of depth, but also really helps for compression quality down the line. When you're optimizing for what the viewer is really going to see when it's released. Additionally, the compression artifacts that you can see here, that's the result of either not denoising enough or not paying special attention to other elements that can make the scene more complex to compress. And you need to address those, because those compression artifacts, you've never seen anything like that in real life, it's going to really stand out to you. Also, because they can be different in both eye, that can cause discomfort and distraction. So you really need to assess that effect in vision Envision professional when moving on to creative color. One of the principles that we use is that idealized reality is a really good starting point for us, not just what it was like really being there, but actually what's the best version of what it could have been. And that's what you can bring to people in this format. So you start with a realistic image as it was on the day you shot it. And of course you want to improve that experience for people. Now you can still do stylized looks. There's lots you can do in the color room, just like you could in a traditional film, but really, there's a power to pushing things right to the limit of where they maintain maintain that sense of authenticity. If the sunlight hitting the grass crosses that line into the from the warmest light you can remember into a color you've actually never really seen in real life, it can break the magical illusion that you're experiencing things just as they really happened, which is the power of this format.
So as I said at the beginning, don't forget also that no audience will ever experience the color of your full scene at once. They'll see the color and composition of wherever they're looking, which is what makes that live preview and resolve so critical to getting the feeling just right for people. Anyone in a traditional color suite knows you've got to color match your monitor. But and you do use that monitor in our workflow, but it will never actually match. Even when the colors match, it'll never match the feeling of being inside the image. And so because of that, you need to grade for that experience of being inside the image, inside the experience. And that's a whole part of our color process. So these are really just a few examples of the places our post team pays special attention to how to maintain that sense of presence. And that's that's one of the superpowers of Apple immersive video. But if I could leave you with just one takeaway after many years working in immersive, it sounds obvious you've heard it before, but it's that important. It is to make sure everyone making decisions creatively and technically are doing it in Vision Pro, the tools are easier than ever, and it's the only way to assess that sense of presence and any one element that distracts or feels fake that you might not catch on a monitor will distract from that illusion. If director Edward Berger can review and vision pro while in a submarine, your directors and producers in post can do it too. So a little plug for later. We'll go into some more detail on some of the concepts I touched on here in our post focus seminars later, but really, I wanted to say for myself, I'm so excited that with this new camera and pipeline and format available to more than just our internal team, I'm incredibly excited to see what you all do with the superpower of Presence in your immersive projects. Thank you.
Awesome. Thank you so much, Tim. That was a really awesome session. I think there's something about seeing a copious amount of behind the scenes images that I just can't get enough of, and hopefully you guys agree to really awesome seeing all those. And don't forget you can meet all of the folks Anton, Franz, Alex, Ivan and Tim later today at the mixer. So make sure you head out and ask all the questions that we are maybe not going to get to. So moving into the Q&A as a little segue, we've covered a lot of ground and I hope we can answer as many questions as possible. We've covered how Apple immersive video can be designed and captured and produced, and sound mixed and delivered. It's been a packed session, but many of you have been sending in your questions throughout the day, and so we want to use this session to try and answer as many as we can. We'll do our best to kind of unpack them, share perspectives, and go about doing so.
So hopefully I'll invite our panelists up to the stage and we can welcome them in and see what everyone's got in mind.
Awesome. So we have France joining us. Alex Ryan, Sarah's joining us back on the stage again today. Hello Sarah and Matt Amick as well.
Awesome. So let's let's maybe start with a little debate shall we. So we had a question come in from Frank wanting to know how Apple came to the decision of focusing on 180 degrees rather than 360 degrees? Maybe, Ryan, I could throw it to you to to kick off from an engineering perspective. It's a great question. So we've developed two two platforms and two formats. First is app. Apple projected Media Profile specifically to deal with 360 and traditional 180 VR content and other things. And so we wanted to focus. We wanted to focus Apple immersive video on that core front of house experience, that full traditional human field of view. So that 180 mark to 230 was really that that core bit. Knowing we had another, we had another tool to to do distribution for 360. So that was the core fundamental of that decision. Nice. Yeah. And of course there's Apmp now for 360 as well. Right. And maybe Sarah, from a creative perspective is there is there a strength in the 180 that you see? I can think of two main strengths from 180. I love the 180 frame. Two reasons. One, it prevents like the twisties, you know, like if you're like watching 360, then you get this FOMO and you're looking behind you. You're like, oh, you know, like there's so much to look at that you feel like you might be missing out on something. But then the big one for me in the field is it's hard enough to hide the crew and all of the production gear with 180 frame like 360. Hats off to those of you who do it like no.
Yeah, I think we can all agree on that. Maybe switching it over to to audio. Alex for you. Oliver had a question just coming in about what tools support Assaf more broadly. Oh, yeah. Sure. Hi, Oliver. If you're here.
So right now you have two options. The first one is if you are using the whole resolve immersive workflow, you can use Fairlight. It's natively integrated into Fairlight. And it's not just a subset of the tooling. It's the whole thing. If you're more familiar with sort of traditional post workflows, we do also have, as deep mentioned, a suite of Pro Tools plugins that you can download, and that way you can basically work in a more standard traditional Pro Tools workflow and still mix in Apple's spatial audio. Nice. And I've been lucky enough to see some of your sessions, Alex. So I know that they're pretty crazy. Almost unlimited track numbers. But for folks perhaps who are getting started, is there somewhere they might be able to see an APAC style project or an Assaf style project that they can get started with? Yes. So if you actually download the Pro Tools plugin suite that I just mentioned, it comes with a test session and a template that has the full routing for you already. And it also comes with some tutorial videos. Nice. All right Matt C has been asking around Elpd files. Brian, maybe this is one for you. Is there a way we can create our own elpd files for 3D VFX workflows in Apple Immersive Video? That's a great question. So Ipds are fundamentally created at the factory during the calibration process, but there are some ipds that touched on earlier that are for very specific projections. And one of those is equidistant. So if you have a CG rendering tool that knows how to export as equidistant, that will work for that. But the the individual parameters at the moment aren't aren't individual creator definable. Nice. Maybe that's a good segue over to you, Matt. Working in post-production and VFX, we had Jerome sending in a question about CG content. So is there a kind of best practice or a way you'd suggest creators get started in not bringing some of the live action content that we've seen? We're all talking about cameras a lot, right? But if you wanted to do something that's maybe an animation shrink someone down to be the size of a Lego person or something like that, how you might be able to do that? Yeah, absolutely. You all saw Anton's discussion earlier about Previz, and a lot of the same principles apply, whether you're doing Previz or you are actually working on a fully CG animated project, or really even the same principles apply if you want to integrate CG into live action footage, we have found that it's always best to. Whether you're working full CG or integration, just start with mimicking the Blackmagic Cinema Immersive camera. All the properties there start with the 64 millimeter IA and 180 degree field of view. Ryan touched on equidistant spherical projection, which is a really good, efficient projection format to start with. If for some reason that doesn't work or the package you're using doesn't support it, you can always do 180 degree equirectangular projection as well. And at that point, once you've started kind of mimicking the camera, then you can start to experiment. If you if you want it to look just like the camera captures, that's great. If you want to do something that is maybe hyper stereo, or you want to do something really creative and take those CG lenses and move them wide and make some kind of diorama, look at that point, you get to experiment. And certainly I can speak on behalf of our VFX team at home. Also, the more you look at what you're doing, render often, render frequently, review it. Experiment. That's the iterate as fast as you can the previous style in a real time engine that's really, really valuable. Even if you're doing a full CG project, the more you can see, the more you're going to be able to experiment and the better what you make is going to be. Yeah, that's nicely said. I think talking about CG content though. It's obviously going to be lovely and beautiful and full of many, many pixels, perhaps even more detail than we can capture with live action. So inevitably, you're going to have to come and encode that at some point. We had a question come in about encoding and what encoders we have available that can take advantage of foveated rendering. I don't know, Ryan, if you want to take this. Yes. The spatial Jen folks talked about their Encode capabilities for MVC and also Foveation. We expect some other developers and hopefully within the next few weeks to announce and deliver their own Foveation as well and keep that integration going. Nice. All right. We're going to change it up now to some creative France for you. We had a question come in, a really interesting one from Frank asking how we can go about shooting interviews, for example, if we're using multiple cameras for for shooting for multiple people in the same perspective, how might you adapt this from a classic storytelling situation in terms of editing and pacing to to immersive? That's a good question.
It's actually hard to film interviews in immersive but interesting if you have several people talking, which I think is a is a scenario here. The good news is you actually don't need that many cameras because you're here, you're filming 188 and therefore you're going to be able to see everything. But you also want to be mindful that you don't want your viewer to kind of feel left out, or like a third will if there is a conversation between two people. So you want to really have the right distance, kind of think as well about how organic the conversation is, that there is no awkward, kind of like I'm suddenly staring at the lens and you're acknowledging my presence, and I'm here. I'm not here. What's going on? When we film interviews with just one person and I mentioned earlier, like, we've had situation where people are nervous. That's normal. Like, you know, they're twitching, they're stressed. And that's really like all you can look at. But when you really manage to be kind of really authentic and conversational and sometimes to help them look into the camera, because we often with Sarah, we turn our backs that people don't even look at us because we're going to be on the side of the camera. And then it looks like you're kind of looking at the wrong place. So some of us even put like a sticker with a smiley face between the lens just so that they really look there. But I think that we also can connect with people in a really great way.
And the last thing I will say about interviews is that you're going to cut them way less, because you're not going to have your A cam, your b cam, or your cam. Usually. Often in traditional 2D linear filmmaking, we have several cameras super easy. We cut between a close up there and then we franken bite a little bit what they're saying. Or at least we use what we want them to say in audio. But here chances are you're not going to cut a thing. You're just filming. You're right there. And it's a powerful thing to tell your character before they speak, being like, hey, whatever you're going to say, however you want to say it, that's going to come across right there, you're actually leading this. And later on, our editor, Justin, is going to speak a little bit about how you cut things, because more than cutting, you're transporting people somewhere else. So it's a it's a good, good point to keep in mind. Yeah, that's a great answer. And one thing that Franz just mentioned, that I know Sarah and I have played around with in some of the training sessions that we've run, is eye contact can be really fantastic in these types of interviews. If someone kind of glances quickly at the camera, you get that instant hit as a viewer that you're there. But if they hang just a little bit too long and looking at the camera, then as a viewer you suddenly feel like you need to do something like you're feeling a bit awkward. So that's something to bear in mind. We talked a little bit about eye contact across today and yesterday, but we never really got into like the nitty gritty of it. That's maybe a little tip for you.
Matt, over to you on VFX workflows. We had a lot of questions yesterday and today all about how we can bring non-Apple immersive native media into our Apple Immersive projects. So folks were looking at things like 2D archive, maybe using other camera systems. What should they do? Yeah, yeah. Yeah, absolutely. It's a it's a conversation that comes up all the time. Archive is a huge one that that I forget who mentioned it. France, I think you mentioned it that there's we don't have 20 years of stock footage to pull from that's already an immersive. And even if it was, it probably would be too low res as we continue to push the boundaries of frame rate and resolution. So that's always going to be a thing that comes up. And I think you're going to hear sorry to the people on the live stream, but the people in the room will hear in one of the seminars. There's a ton of great fusion tools in black magic that are going to enable you to incorporate that into your footage. And you can kind of do it the there's a the elegant way, which will come from the Blackmagic Fusion tools, where you can maintain as high a resolution as possible for your archive or 2D or 3D footage that's captured from another rig.
There's also sort of your fallback method where you can take those. You can take that footage into just a traditional VFX pipeline and kind of follow the same principles we already touched on of either render out, you build your scene, you can control exactly how you want it to look, the depth of your footage, the size, shape, projection. You can curve, your screens, you can do it, you can add atmosphere. We've we've experimented with playing archival where you have sort of faux light spill on a fake floor so that your viewer feels grounded in the scene, and they're not just floating in a black void. And you can render those out using equidistant projection or equirectangular. Rectangular, so there's multiple ways to go about it. And one thing I would encourage is your viewer, more often than not, we found is not going to be aware of the production reasons why you're deciding. They won't put together that. Why aren't they showing me the archival in immersive? Why am I leaving immersive. So making sure that you as the creators, you know exactly why you're doing it. You you're not just throwing it in there just because you use it ideally sparingly. Because we want to. As we've talked about a ton immersion, maintaining immersion is one of the big powers of this format. So you know why you're leaving immersive footage. You know what you're there for and what you're getting out of it. And then and then you transition your story back into immersive when you're ready to. And and I think you'll be okay. Yeah. Nice. I think one of the, the ways that we see a lot of people using or desiring the use of 2D is because they're trying to default to that traditional storytelling cadence, right. And sometimes not having that opportunity to pull in archive invites you to tell the story in a new way and stay in an immersive form factor. So it's definitely worth challenging yourselves there not to default, but rather, you know, push the boundaries if you can.
All right, I might ask one more engineering question, and then I promise I'll finish off with something hopefully more lighthearted. We talked about Foveation Ryan with with your session folks were wondering, we had a couple of questions. I see Alex asked this one as well.
Is it possible to define the position of that foveation shot to shot so that, for example, if we have something on the far right of frame that we know the audience are going to be looking at, we can maybe sharpen that up in comparison to some of the other content in the frame. Yeah, that's a great question. So obviously a lot of that plays into into encoding at the end of the day. So what color may concept and the others are doing. And being able to integrate those tools in Foveation is one of those developer integration things. So in the beginning it is a it's a it's a it's not a region of interest quite yet, but we look forward to watching developers do interesting things to see if they can take the next step further. Nice. That sounds exciting.
Alrighty then. Well, let's maybe wrap up on the Q&A I promised you I'd ask you all the same question. You haven't had that much time to think about it. I promise you, you've all done amazing things with with Apple. Immersive video from creating and directing our Apple immersive titles, audio engineering, Ryan with you creative and VFX towards the end of our chairs here. But if there's one thing you haven't tried yet with Apple immersive video that you really, really want to or that you'd love to see, maybe you could share that France starting with you? Yeah, I think one thing I would really love to do very soon, and they've actually done it in the Metallica film, which is really great, is using Steadicam and being able to just follow a character, walk through and just really sees a world the way we see it, but also just being there behind them. And I had a whole section about the tripod. Right. And it's really hard. At least you have the harder it is to cut something. And so yeah, for me, working with a Steadicam, wandering through great places, following someone interesting is next on the list. Hopefully. Alex.
I think I have a couple of answers. The first one is going to be the sort of more stereotypical sound guy answer.
We I mean, I just think there's so much that we can still do with like spatial audio in terms of like storytelling. And I think we all, me included, sort of have a tendency to like maybe still over rely on things like dialog or voiceover.
Same with score. So I would love really as an experiment to like go back to like actual audio visual storytelling where we don't rely on dialog and don't rely on score or at least rely less on it. And same sort of things with like dynamic range. Not everything has to be always like loud and like right up here. It's like really nice to sort of play with those colors a little bit too.
And then very similar to that. I mean, there's also, I think still so much in this sort of like music space that would be really, really cool to try. You know, we started already a little bit with like the rehearsal room experience, for example. And I think that kind of stuff, like there's a lot of potential, I think, to sort of blur the line between like concert and being like basically on stage much more intimately with the artist. Interesting. Ryan, what about you? That list is so long. How to get to the one thing is just genericize it. Yeah, there's there's going to be so many places in my lifetime that I will never get to. And it sounds overly simple, but I just want to go sit at all the places I'd never be. You know, the moon, the Titanic's deck. You know, just go all those crazy places. you know, places that I physically will never get to in my lifetime just to sit and experience it, just for a moment. I don't necessarily need the narrative around it though. Some history would be great. But just sit and enjoy this space. Yeah, I think like the system environments for example. Yeah yeah yeah yeah. The new Jupiter one with vision OS 26. If you haven't checked it out, you just sit there like playing with the constellations. Mesmerized, as you know, mangroves underwater, floating in front of me for a little bit. Yeah. Did you mean Titanic underwater or like.
No, I mean either. Okay. Okay. Sarah, what about you? Yeah, I think kind of carrying on from what you were saying, Ryan, I think, like, I've been really lucky to go to and take audiences to some places, you know, like, like Kenya for the elephants and Borneo for the orangutans and the Bahamas for the sharks. And like, there's a lot more world out there, you know, like, it's sure Jupiter's great. Believe me, it's my second favorite planet. But, like, it's it's a beautiful world. And there's there's so many more places that I would love to go to. Places that are beautiful. Places that sound great. You know, there's a lot of good audio out there too. So I think that my generic answer is more, you know, there's like there's there's so much more out there, more ways to tell stories, more amazing stories waiting to be told in so many places. But yeah, just just more please.
And, Matt, what about you? I think I might have one of the less budget friendly answers. That was more budget friendly. Yeah. True, true.
The format is so new that that there hasn't really been a ton of opportunity to. I love the the connection that you get with people that you're capturing. And so to me, following somebody over a long period of time I think would be really cool seeing whether they are whether you sort of outwardly see them as interesting or not, just spending Spending time with somebody over their life and and really getting to connect with whatever they're going through. It means your production might have to run one, two, three, four, five years, which nobody, no producer ever wants to hear. Yeah, decades. But I think that would be really, really cool. I would love, love an opportunity to see that connection, see somebody growing over a long period of time. I think that would be great. If the budgets don't support that, then my second answer would be, I would love to just see sticking to a post-production schedule once.
That would be that would be very cool. Maybe. Maybe easily or more easily achievable. Maybe not. I'm not sure. Yeah, that's a fair way to close this out, Matt. Thank you. Yeah. Listen, what an incredible day it's been. But, you know, sadly, like all good things, we are going to have to bring it to an end for our online audiences. So for the folks joining us from all around the world tuning in whatever time it may be. Thank you so much for giving up your time to be with us here in the room, and we really hope the past couple of days have sparked some new ideas. Perhaps maybe some new collaborations, or even your next great big story. Who knows? So on behalf of everyone here at the Developer Center in Cupertino, thank you for spending the day with us. If you joined us yesterday to thank you also for that, and of course, for sharing all your incredible questions and your passion for these exciting new mediums. It's been awesome to see you all here.
We hope to see you again soon. And for the folks in person, we'll be right back with some seminars and workshops for Apple Immersive Video. Thanks.
-