Here are some random clips from the third stage of AI post layer testing. It's a long way from working correctly, but you can see a lot of improvement from the first version, with this one being far more stable. This final one will of course need to be nearly 100% stable.
These tests are run at 1/8 of final resolution, for the sake of speed, since it's an iterative process requiring many experiments. It's a very complex problem, with many possible solutions (we're testing various combinations of code from hundreds of competing research papers). Right now I'm thinking that using optical flow to "remember" stylization solves from frame to frame might be the key.
Some people ask why I'm doing this layer at all. It's extremely important. Basically, this functions as a consolidating layer that will ultimately auto correct all compositing errors and asset mismatches across all frames project wide. I can buy a stock footage clip, drop it into the background of a UE5 project, and use this to combine them seamlessly with one click. That means that this layer, once complete, will function as a universal adapter, seamlessly joining the output of not only different programs, but different media sources. In example, since this is a 100% refabrication of every pixel, I can use any source photo without copywrite infringement. So I can take any frame from google maps street view, and use that as a background for an animated scene. Same with any footage. If I need a car to break down on a road in front of Everest, I can build the car and road in UE5, grab any photo of Everest, and use this layer to combine them seamlessly, reconstituted into a completely original work.
In one of the first shots, a crowd of people can be seen walking along a spaceport causeway. Those people come from archvis packs, a different, poorly matched style that doesn't quite fit the look of the spaceport, which is from a different creator using a different style. In the version above with the primitive alpha layer, you can see that everything on the screen now looks like it was drawn by the same artist. It's automated tonal matching. This won't shave hours off of production, it will shave years.
Here is the second test, which I never bothered to post. They are all still terrible, but you can start to see where I'm going with all this.
For reference, or anyone who never saw it, this was the first test, a few months back. You can see that we've improved the coherency a lot since this one.