NextVR continues to make the VR movie making news with their announcement of a new light field camera rig. Joining us is David Cole, Co-Founder of NextVR, and he answers all the tough questions for us! So here we go…
For those unfamiliar, what does NextVR do?
NextVR captures and transmits ultra high quality live-action stereoscopic 3D VR content in such a way that the viewer really feels as if they are where the camera is. Technically, it’s called stereo-orthogonal capture, and results in a deeply-immersive experience.
While your roots are in stereoscopic 3D rigs and broadcasting, you’ve since transitioned to the VR space. Why is cinema and broadcasting so strategically important to the VR world? Do they carry as much weight and potential as video games? Why or why not?
Before mobile-phone based solutions became public news, I think most pundits regarded VR as a gaming platform. Mobile has REALLY changed that perception for a number of reasons, some of which are: The demographic for smartphone users is SO broad – and they are so accustomed to consuming video on their phones already – that live-action content in VR is a natural fit. Also, the mobile products aren’t going to have the frame-rate for AAA VR games – so there is an advantage for live-action content. But, mostly, it’s that live-action content is really compelling stuff. I think we’re even hearing this from Oculus now. When Mark Zuckerberg announced the Oculus acquisition, he said, “Imagine enjoying a court side seat at a basketball game…” – and – Nate Mitchell recently said, “it may well end up being that VR is more about film than (video) games.”
It seems everyone has a 360-degree rig for VR these days, but the availability of stereoscopic 3D support is lacking. Why is this the case? Why has good stereoscopic 3D capture been so challenging for VR movie making?
The conventional method used for stitching images from multiple camera views together in 2D (feature-matching and warping) fails miserably in 3D because it does not create an identically stitched 360 panorama for the left and right eyes. This results in brutally bad stereoscopy.
NextVR’s latest breakthrough is the use of light field cameras instead of traditional cameras. What is a light field camera? How does it work compared to what is traditionally available?
To be clear, NextVR is combining light field imaging with our current, stereoscopic 360 imaging. To answer your question, a conventional image sensor records the intensity and (indirectly) the color of light that falls on it. A light field image sensor records all that plus (indirectly) the vector from which the light came. This allows the depth of the scene to be derived somewhat precisely.
Positional tracking, or the ability for the VR device to detect how your whole body is positioned versus just the direct rotations of your head, is a vital element for making a comfortable and convincing VR experience possible. Why are light field cameras important for positional tracking in cinema? What problems do they solve that couldn’t be handled before?
Well – there are a number of methods one could use to derive the necessary depth of a scene to support positional tracking, but note that depth capture is only PART of the solution to positional tracking with video. So a light field camera is a very useful method of quickly acquiring depth information. It’s superior for our application to other depth sensing methods such as structured light, time-of-flight, sonar, etc., in it’s speed and resolution.
What is the interaxial distance between your stereoscopic 3D cameras? Are you able to get the same distance between the left and right view as you would for a traditional 3D movie? Why is this important?
NextVR changes our interaxial distance for different applications. We don’t achieve the same minimal interaxial that can be achieved with a beamsplitter rig – because that can actually place the cameras and lenses effectively in the same location, but we don’t need small interaxials for our stereo-orthogonal process. We actually benefit from large spatio-angular offsets between our cameras.
In NextVR’s press release, it describes the experience of being able to peer around objects or people. I completely understand the ability to maintain or change focus when kneeling forward and back, but how can the cameras capture information that isn’t directly in front of them? Did I misunderstand the meaning of the text?
We have very high spatio-angular offsets in our camera configurations. This allows the second element of the positional tracking solution; view synthesis. Simply put, we can fill in holes that are formed when occluded elements of the scene are uncovered by a change in viewing perspective. This is a very hard-won solution to the problem and something we’ve been working on for years.
I understand you’ve been doing work with the National Hockey League (NHL). Congratulations! Does the NHL have any influence on how materials are recorded? Do they have a specification that they insist upon from broadcasters? Why is this the case? Does NextVR meet the spec?
I can’t comment on any partner’s specific requirements or plans – but – in general I can tell you that sports demand capture quality in a way that no other content does. For example, frame rates below 60 are completely unacceptable for resolving the action. If you can’t clearly see where the puck is on the ice because of low-frame-rate motion blur or jitter, the game is unwatchable in VR. Additionally, resolution and sharpness is a massive factor for sports. Viewers have to see the jersey and player’s faces AT LEAST as well as they could on an HDTV. It’s a challenge to be sure! That’s why NextVR will not capture with low-end consumer cameras or security cameras like others do.
I remember with your earlier camera rigs, there was a challenge that you couldn’t move the cameras around too much as it would cause instant sickness. Is this still the case, or do light field cameras help overcome this problem?
Actually, the closer you get to REAL, the more careful you have to be about observing best practices for camera movement. We have a successful approach for motion that is pretty flexible… just no whip-pans for our rigs!
Let’s talk about the naked people. What are your naked people plans for this technology (if any)? In the cleanest words possible…as a professional, of course…what are some of the biggest challenges in making a good naked people VR movie (other than finding attractive naked people willing to be in a naked people movie)?
It’s not a genre that we’re pursuing. We’re working to bring the biggest content partners on the planet, with the largest fan-bases on the planet, to our platform. That said, it’s an interesting application and I think there will certainly be a market. VR is sort of the perfect, private display for naked people. For a 360 application – you’d need a full-on Caligula-class orgy to get the best use of the real estate! I can’t even imagine what sort of digital debauchery volumetric AR displays like Magic Leap’s are going to enable…. Well, I can imagine it. Just did. I’m trying to stop imagining now.
Thinking back at the 3D days, there was always a concern about bandwidth; that stereoscopic 3D required as much as double the capacity to get the imagery through. The market responded by coming up with clever codecs and compression mechanisms to get around the problem. How would you describe the bandwidth and storage requirements of 3D-360-degree video capture compared to traditional materials? How much more does lightfield technology add to the mix?
Our platform is built on a stereoscopic transmission technology called compound entropy stereoscopic encoding. It’s very effective compression that has been used to broadcast stereoscopic content for 3D TV. This encoding process combined with our transmission scheme that optimizes bandwidth for the element of the 360 scene that you are looking at, allows us to get the data rate down to between 4-8 mbps. To put that in perspective, a 1080 HD stream from Netflix is 8 mbps.
NextVR recently announced that it did the first live stereoscopic 3D 360 degree broadcast. Given the bandwidth requirements and the encoding time requirements…how did you do it? What’s needed to make it all work?
We used the NextVR live-encoding technology that I just described to deliver the stream.
I’m going to go out on a limb here and say that I think the industry is about to get really crazy: Oculus, Valve, Samsung, ImmersiON-VRelia, Microsoft, Magic Leap, Razer, Sony, Nvidia…the list goes on and on – all expected to release VR devices of some kind. I think GDC represents the first event that really showcases how wild west this industry has become and will become. You’re going to be speaking at the upcoming Immersive Technology Alliance meeting happening during GDC. If there was a message attendees and industry players could walk away with, what message would that be?
Quality is critical at this stage of VR development. 3D TV died for the sins of quality in eye-wear and transmission. If we aren’t very careful, this industry could follow 3D TV down the road to ruin, or, be relegated to a novelty and never achieve it’s full potential. I’ve seen so much BAD VR and live-action content – that makes you instantly sick. That stuff is toxic to the industry. This is a pivotal moment. End-users are starting to get devices. We have to avoid the temptation to “shovel content” onto these platforms, just to bulk them up. We absolutely have to make the best first impression for these new users.
Great stuff! See you at GDC, Dave!