AI based Foveated rendering - new technique - I hope this gets used for the 8KX! :)

Not quite sure where to put this but I thought some people might be interested in it so check this out :slight_smile:

Could be great for limiting bandwidth but I guess it would actually need decoding then on the headset so probably not useful in it’s current form but interesting still!


I take it this requires per-scene data from pre-analysis against a ground truth?

It’s seems a different approach then Nvidia DLSS. premptive full analyses is not required, but latency could be problematic if the refresh rate is not very high 90hz++ maybe you can read more to it then me link below :

3.2 Design GoalsThere are several goals that we would like to achieve with ourmethod. First, the DeepFovea network should be able to operating an online mode, i.e., it should be able to reconstruct the currentframe based only on the past frames

3.2.2 Performance Considerations.If the method is used for gazecontingent reconstruction, it has to exhibit under 50ms of latency foreach frame in order to be unnoticeable for human vision [Guenteret al.2012]. Moreover, for head-mounted displays, the method hasto run at HMD’s native refresh rate and high resolution to avoidmotion sickness and provide a comfortable experience. For manyexisting VR HMDs the minimum refresh rate is 90Hz.

Recurrence.In order to generate temporally stable video con-tent, the network needs to accumulate state through time. Moreover,a temporal network is able to super-resolve features through timeand can work with sparser input while achieving the same quality.However, our network has to be causal (i.e., cannot see the futurevideo stream) and should have a compact state to retain over timedue to high video resolution and performance constraints

1 Like

Certainly, but question would be - how close does the training data have to be to the actually played game? So do they have to train this for each game? Or is it generic enough to cope with a wide spectrum without retraining?

Not really, only need to be sync with the eyes tracking positioning system but the games engine must also support it and also be able take in account pupil positions provided by the headset

if you read the except from the paper they say 50ms latency max and online performance ( serial image aggregation with no future data) so at 90hz -> 90 images in 1000sec = 4,5 images?

Rather a few magnitudes above my pay grade, but I do see your quoted bits, as well as mentions of temporal sampling and motion compensation.

I can imagine fast camera motion would become a serious problem; On the other hand, we do have that car speeding away… wonder what it would be like a few frames later. :7

What really impresses me, is if these guys can do all this real-time analysis and reconstruction stuff, and not have it be more computationally intensive than it would be to just render everything in full in the first place. :wink:

So many questions, and so hard to wrap one’s head around it all. At any rate, there seems to be plenty of interesting things to look forward to (…and quite a few consequences to be concerned with…)
I can not but imagine that while the algorithm is probably generic enough, the data would have to be very specific. I guess, following Frc’s post, in this case it is generated on the fly, from samples accumulated from previous frames, and so intrinsically relevant…

Looks like they found a reasonably representative training set that works for various content: “We train on videos sampled from a videodataset [Abu-El-Haija et al.2016] that contains a variety of naturalcontent such as people, animals, nature, text, etc. Each video hasresolution up to 640x480 and up to 150 frames. For each video, weprecompute the optical flow using the FlowNet2 network [Ilg et al.2017]. Next, the video is downsized to 128x128 to meet GPU memoryrestrictions. Lastly, the videos are sliced into 32-frame-long chunkswith an overlap of 8 frames. The total number of video sequencesin the training set is about 350,000”

Question now is of course how hand-picked (and close to the training data) the demonstration example was.
Running some online training at the gamer’s site would somehow contradict the advantage of the concept. The network that has to tell the output of the generator and the original apart needs original input. And for that the full high detail image has to be rendered.
As far as I understand it the idea is that during execution (for which the latency and cycle-time numbers were provided) only the foveated part is actually rendered, then fed into the network and the output then sent to the HMD.

But cool, yes! Faster GPUs, more clever algorithms, future VR games might look really great!

Lots of interesting info from these replies. Good stuff guys! :slight_smile:

Ah, you actually read and grokked the things, instead of just scrolled through the document and glanced at the pictures like… erm… some of us. :stuck_out_tongue:

Kind of wondering how much data we’re talking about standing behind the network at runtime…


“The DeepFovea model has 3.1 million parameters and requires111 GFLOP with 2.2GB memory footprint per GPU for an inference pass” :slight_smile:

Aha, there we go… That’s… some. :7

Yepp, quad Tesla V100 for inference and 7x8 GPUs in a DGX-1 cluster for training is a little expensive for home users. But when compared to the supercomputers of yesteryear it’s actually a steal!

A few node shrinks might still be in the cards. And NVidia has afaik researched multi-chip GPU quite a while ago. Even if AMD seems to be the guys who put these into products nowadays.
Hopefully within a few years we all will have machines under our desks that can do stuff like this!

The marketing departments of the GPU manufacturers will be happy to have new reasons why every person needs a crank supercomputer at home :slight_smile:

Just had a look at the top 500 supercomputer Wikipedia article:
In 2003 a 111 GFLops machine would still have been in the world wide top 500 supercomputer list. And a little before 1995 it would have been the fastest several billion dollar, house filling, several-megawatt-for-lunch-eating machine in the world. These times are not that far away in my memory. Getting old :wink:

1 Like

Then again, a dedicated room with one of those old round Cray cabinets, with the running fluorinert cooler next to it, would be kind of spiffy… :9