Hi, I'm an idiot engineer and proud of it.

I was trying to figure out if I could stage cameras in the level to record "game play" and use multiple players to act out the scene rather than staging all the animations. For detailed shots like conversations I might be able to use something like those camera tracking avatars and overlay the footage over recordings from ure. Blending human acting with the game scene. I think that would be the optimal route to capturing some semblance of realism. I don't mind acting in front of a green screen if I could be done to control a realistic avatar. That would really cut down on the production time for my concept. It would still take a team of people, but not nearly as many.

I am curious of how you achieved that effect with the hybrid cat. Is it the same as what you did with the key?

I was thinking about making the videos about 18 minutes long with a little bit of previous episode highlights in the beginning so new viewers would note that it's a series and got to the channel page to find episode 1 and binge from there. With every video ending on a cliff hanger do to the voting process I think people would binge until they got current then subscribe and vote. The regular upload schedule should cut down on the advertising cost to acquire new viewers as the YouTube algorithm would be able to figure out the target audience after just a few videos, and show new episodes to new potential viewers that have similar interest as past viewers.

I'll admit pushing out an 18 minute video every week will be a monumental challenge. Especially if I tally the votes for 2 full days, but I'm hoping if it took off I could get enough views and votes in the first few hours after release to know which way to proceed. I think I would want to have the first 6 episodes ready at launch which could be used for basic character introduction and world building before introducing the voting. There are a lot of great shows that have been produced on smalls sets with just a few actors. I think that is where I will have to start in order to be able to put out the episodes quickly enough. However as time goes on I could buildup virtual film locations.

What do you think is the best way to reskin a green screen actor? I think I saw one of your videos where you did something similar matching facial expression and lipsyncing the model to the audio. It looks like it worked pretty well, but the model was a bit twitchy like it was jerking back to a default expression when you weren't speaking. I tried some motion tracking webcam avatar software a while back, but the quality just wan't good enough to be usable for anything. The tracking kept getting messed up and the model would do all kinds of twitchy things. It also didn't like my glasses, it did a little better when I didn't wear them, but I also couldn't see anything.
Hey sorry it took me so long to reply to this I had a couple of migraines earlier in the week, In addition to a very heavy work week after that.

There are a couple of other reasons that it's taking me a while to respond. It's complicated. One issue is that some of the questions you're asking have very complex answers, that would take me significant time to compose.

The second issue is that while they are good questions, those answers would essentially be obsolete already, An example the cat video is now four generations old. Sigh, this is going to be a difficult aspect for you, or at least it's been the most taxing part of this entire process for me. Visual ai science is moving at what I would call an unprecedented speed. You can imagine what it would have been like for Hollywood if they had set up the film Raiders of the Lost Ark, Spent a quarter million dollars on a new Panavision camera, And then that camera become obsolete during filming before release, and they had to go back and spend the money and time again and re film the entire movie because while they were filming 20,000 other 20 second movies came out on Tik Tok that looked better, Acclimatizing the viewing audience to a new standard that made their film look half as impressive as when they started. Now imagine that happened 15 times during the filling of the movie. That's kind of where I'm at, Except instead of getting paid in advance like the Hollywood people, I'm chasing a moving goal post to try and get one paycheck after over three years of work.

Lastly I just have to think long and hard about how much of my bag of tricks that I've spent over 10,000 hours developing I should share openly. Since you just arrived recently you have no way of knowing this, But I built this entire thing around the concept that I would spearhead this massive development and create a framework for interactive fiction and the resources to go along with that, In exchange for people helping me with my project which is honestly way way too much work for one person to do, or 20 if I'm being honest. Save point would provide the technology resources and training necessary to accomplish CYOA film making, And I would recover my investment via the expansion to the product created by users of the platform I offered. Essentially like what happened with Minecraft.

It's not as though I haven't gotten any response to that, But a vast majority of people who signed onto the project had near zero skills, equipment, investment, or work ethic, Which was a frustrating problem. At the point where I had 45 people that had joined, and I realized I was doing as much work as all of them put together per week, I shut down the discord and have just worked on this by myself for what I would call an average of 70 hours a week year round. When you initially showed up I had hoped that since our goals were so closely aligned you might become an important ally in the fight to make this new system possible. Correct me if I'm wrong, but it seems like while similar, Your goal and strategy is significantly different than mine, to a degree that would make it difficult to work cooperatively. Basically a couple of things make me think that. For one I'm mainly excited about a system that offers a huge amount of choices, not 50 a year but literally thousands a year. I'm interested in creating an investable company that can pay the bills for everyone on board, It's extremely unlikely that that would be able to happen using the method you're describing. Honestly both our strategies should be winning strategies, in a more sane time, but the way things are in the creative market right now and especially on youtube, indy film makers are essentially getting paid 0.01 cents on the dollar In comparison to their Hollywood counterparts. In example according to my math I would need literally 100 million views per year simply to sustain the organization and pay one middle class salary. Chances are it wouldn't be quite that bad since direct sponsorships would become available far below that level, And I could easily move the show to Netflix where the pay scale is drastically higher for the same work. Still the current situation is rough for people in my position.

I do think what you're trying to do is cool, I liked your demo, and I want to help you out. So what I've decided to do here is to provide you with Some of the key puzzle pieces you'll need to succeed, Without necessarily divulging many of the techniques and technologies I've spent so much time on, specifically to gain a competitive edge.

1. First is my core innovation from 2012, The concept of using A video game engine in this case unreal to create master templates which can be infinitely reused via refabrication by ai. Out in Mountain View, some of us were working on early versions a visual ai In 2004-5, With the first glimmers of what you see in the ai world today showing up around 2011. I couldn't do the thing I'm talking about back then, no one could, But there was just enough of a hint of the possibility that I could see it. Eight gpu generations later, It's now possible.

In the video "the key" you see this working. You create a scene in a 3D engine that is general enough to be reused in many scenarios. As an example two people walking down a hallway having a conversation. You build a custom ai layer that allows you to refabricate that footage into any two people walking down any corridor having any conversation. That's the core concept. Over time such a system could be built out to include hundreds of common scenarios, reducing the workload for CYOA development in film by an Immense amount. Execution of this has been extremely complex, And within the last year specifically, Every time I hit a major milestone in terms of accomplishing this, Some investor like in example the government of China, Has given someone $200 million, At which point The public immediately assume that I am copying off of their test papers, Even though there are numerous timestamps of me working on this years before some of these companies even existed.

Ranting aside the key takeaway is that this core concept is the single most significant thing to understand in terms of enabling huge CYOA films that just would not have been financially or technically possible in the past. Create one real unreal engine color coded video of a man riding a horse down a trail, And eventually extract 200 shots from that one video of various characters riding a horse through canyons, mountains, cities etc.

2. The cat video and what it proves. Animations are a huge roadblock in terms of work hours to produce a film grade product in animated form. For many scenes the most powerful solution would be to be able to combine 3 D sublayer engines with live action footage, And produce a single consolidated stylized output. That's what that video is about. I can animate locations cars etc very well in the 3D space, And to a significant degree humans mainly because Epic, the creators of Unreal, have spent so much time on their metahuman technology. So at this point you could just take live action film of a tiger ink activity and get exact lifelike realistic animations that would really never be possible from 3D spine animations, And you could do it a fraction of the cost of the inferior animations.

Here's what you need to get started

You need to learn comfyui, Which is essentially an object oriented visual programming gui specific to visual ai tasks and custom pipeline construction.

You need to learn to use control net or multi control net inside comfy ui.

You need to learn about Loras, what they do, how they can help to shape your output in powerful ways, And how to train them.

Once you understand all that you will probably understand what I'm doing here. You can key live action footage of X in this case a cat onto a digital backlog you created. Here's the important part. You can do it badly, You can be sloppy and have C grade rotoscoping, The magic happens when all of that slop disappears during the stylization process. The fact that I'm using the two stages effectively reduces time spent on compositing by over 90%, probably way over, In addition to providing the unprecedented flexibility in post that I described in point one.

3. Rent online supercomputing power and license large corporate models for the most intensive tasks. I've got a couple of 4090 cards now, And it's enough for me to do the work that needs to be done locally, Maybe more than enough, But if you want really amazing results With bespoke code, You'll be way better off trying to run the final layers Off site on rented H100 cards. I built my own animation models 10 times, And they worked like you saw in the cat video and others but ultimately I can no longer compete directly in that area against companies that are literally funded by governments.

4. Video to video is the real answer to creating these types of products that we're both trying to engineer, But if you want it to be faster easier and lower quality, You can now get somewhat effective results from some of the best text to video solutions, Which is what I was trying out in the "River" demo. The limitations of this method are very limited camera control, action, and interaction. Do you need a part in your film where someone hands another person a glass of water and then that person takes a drink? Right now you're at an impasse with text to video. You can literally work on that scene for 6 hours and never get it.

5. Work in a relatively low resolution optimally the native resolution of whatever model you have created or chosen, And then use ai upscaling for deployment. You can build this directly into your comfy ui workflow at the expense of development and production speed, Or simply use Topaz AI to upscale in post, Which I have found to be a good trade off With a mild reduction in quality in exchange for a significant increase in overall speed.

6. Pursue consistency above all else and don't be afraid to lose visual quality while trying to avoid the uncanny valley or blending in with a million other people that are now all chasing photorealism as a default result. I'm about to publish a demo of the version 8 pipeline probably later this week. Check out the thread "Panavision 70 millimeter AI craze", And then compare that to the next demo I post, And you'll understand exactly what I'm talking about. People will pay for South Park episodes, People will not pay for those Panavision seventy millimeter videos. The second is objectively of much higher quality, But ultimately the style fails with the viewership in comparison to lower quality work that people simply enjoy watching more.

ComfyUI, control nets, Loras, Model training, off site supercomputing power, CivitAI, UE5, Megascans, Metahumans, and Da Vinci resolve. That's what you need to really get started.

Lastly I'd note that many people encountering me for the first time get the incorrect impression that I'm trying to do this for the money rather than because I love the art form. That's actually not the case, I just understand that in practicality something as ambitious as an open ended CYOA film that spans thousands of micro episodes will simply crash and burn without consistent financial input of some scale.

If at some time in the future you become interested in joining my project, I'll of course be willing to supply you with a LOT of additional resources training and support. Otherwise I wish you the best of luck in your project and hope that the starting points I've provided you with here can assist you in your endeavor.
 
Now I understand what you meant by pipelines in your previous posts. ComfyUI. I'm a little familiar with control nets from using invoke.ai.
I'm not trying to go for photo realistic humans. Mainly because it actually creeps people out when something looks too realistic, but something seems off. I would like to get to a painting quality of realism. Enough to look good, but not so realistic that you can mistake it for videos of real people. To me the meta humans look a bit creepy and clunky sometimes where the hedra videos don't have that creepy factor so much because of the lower quality. It's odd saying something lower quality looks more realistic than something of high quality.

Your right about video to video being the best way to accomplish what I'm trying to do.
I don't have to create the whole thing using ai. I think it would be cool to be able to film some stock footage at real locations, and then put the reskined green screen actors in using ai. I could still use ure for some locations and backgrounds.

I like the challenge of it my pay is what I learn from this process. I have much easier ways of earning a living.

I think your doing great work, I may be interested in joining your project in the future. At this point I don't think I have much to offer that it would be worth your time. I can't wait to see your new demo.
 
Back
Top