Outside of the Box

Nate North · Dec 2, 2024

The first production cell of the labyrinth, I might still tweak a few things between now and release. This first cell is important for obvious reasons.

I could use a little feedback on how the audio works out on different types of speakers, since I only have studio monitor headphones. Too much reverb on the voice? Mastering EQ too aggressive?

i'd say I have enough raw footage right now for about 500 cells like this. It will likely take around 2500 cells for me to complete the project. I'll start the labyrinth small and scale it over time.

leppy · Dec 2, 2024

Hi, Nate, Hope your keeping well?, excellent work, production top, the sound is perfect on my headphones, gives upmost atmospheric feelings, good one, many regards, Ian & Zey

sfoster · Dec 2, 2024

I use a TV as my monitor. The sound is good for me.

One thing did stand out - the lack of blinking and facial expression in the cat.

directorik · Dec 2, 2024

Just used my iMac speakers and it sounded great.

Nate North · Dec 2, 2024

Thanks I appreciate the feedback, It's helpful to know that this particular mixdown is working well on a good variety of systems.

I do get the cat to blink and make mild expressions sometimes, But it's hit and miss for now. Cats don't actually have facial expressions other than how far their eyelids are open, and the pipeline doesn't support anthropomorphic animal faces very well. Human facial expressions on the other hand are pretty good at this point, I've been meaning to show a demo of that but have been too busy to get around to it so far. Future clips of the other two projects will show that off.

I'm also glad that my original music didn't seem to annoy anyone too much. (this time, lol) I can't afford the licensing fees for professional orchestra recordings, and the small number of stock stuff that's good gets overused in a lot of commercials. I feel like if someone hears a song in a film and then hears the same song later in an advertisement for a reverse mortgage, It kind of undermines the creative credibility of the film.

sfoster · Dec 2, 2024

Nate North said:
Cats don't actually have facial expressions other than how far their eyelids are open

That is, unfortunately, a big part of what made live action lion king so bland

CelticRambler · Dec 3, 2024

Bog standard laptop speakers for me (at the moment) and nothing about the audio rankles except "the colllllar, it had never been gone before". Preceded by a shot seemingly emphasising the cat's bare neck, it took me three views to figure out it was the colo(u)r, not the collar that was missing!

On the other hand, I find the discontinuity of the cat's journey through the house unsettling: it doesn't match the progression of dialogue, it feels like it's only a series of clips of "a cat in a house" Primary example at 1m40s - cat sitting in the window looking at whatever's going on outside; next shot, cat teleported to a random point in the middle of the room looking with vague, humourous, definitely-not-feeling-threatened curiosity at the window.

sfoster said:
One thing did stand out - the lack of blinking and facial expression in the cat.

Nate North said:
Cats don't actually have facial expressions other than how far their eyelids are open

Not sure if you're saying here that SavePoint doesn't/can't render feline facial expressions, or that no cats have facial expressions. If the latter: that's very wrong, and a misunderstanding of this may be contributing to the disconnect between this cat's behaviour and the story/environment. Apart from their eyes (eyelids and pupils), cats also use their ears, whiskers, lips, nose, cheeks and facial hair to communicate ... aswell as their tail, paw placement, general posture and situational positioning.

Even if (some!) humans find it difficult to explain why, for example, they feel a cat is giving them the evil eye, we pick up on the whole-body statement being announced to us. An artist (digital or biological) who is not tuned in to this range of subtle and interconnected expressions will struggle to animate the cat properly, leaving it rather bland or introducing aberrations such as that described above.

With only a couple of exceptions (specifically when shown in profile*) and other than the "feck off, I'm sleeping" face at 0m3s, this cat is shown from start to finish with ears fully fully erect, directed fully forward, immobile, eyelids fully open, pupils half open, whiskers loose and forward, muzzle relaxed, lips loose, paws together, standing tall, tail mostly raised, moving in a straight line in open space with a regular gait. That's a cat who's absolutely not bothered by or particularly interested in anything, - which is completely at odds with the narrative.

I know you've said before that you've been prioritising other aspects of the production process - but if the cat is the hero, isn't it time now to deal with this aspect?

* For comparison, at 0s29 the most realistic expression : body static, right up against a vertical surface, then twisted, paws crossed, head partially lowered, tail horizontal, micromovements of the tail tip, ears raised and forward, micromovements left and right, eyes forward, whiskers pulled back. A cat on a mission. One could almost imaging that was rendered from a live-action clip.

Nate North · Dec 3, 2024

CelticRambler said:
Bog standard laptop speakers for me (at the moment) and nothing about the audio rankles except "the colllllar, it had never been gone before". Preceded by a shot seemingly emphasising the cat's bare neck, it took me three views to figure out it was the colo(u)r, not the collar that was missing!

On the other hand, I find the discontinuity of the cat's journey through the house unsettling: it doesn't match the progression of dialogue, it feels like it's only a series of clips of "a cat in a house" Primary example at 1m40s - cat sitting in the window looking at whatever's going on outside; next shot, cat teleported to a random point in the middle of the room looking with vague, humourous, definitely-not-feeling-threatened curiosity at the window.

With only a couple of exceptions (specifically when shown in profile*) and other than the "feck off, I'm sleeping" face at 0m3s, this cat is shown from start to finish with ears fully fully erect, directed fully forward, immobile, eyelids fully open, pupils half open, whiskers loose and forward, muzzle relaxed, lips loose, paws together, standing tall, tail mostly raised, moving in a straight line in open space with a regular gait. That's a cat who's absolutely not bothered by or particularly interested in anything, - which is completely at odds with the narrative.

I know you've said before that you've been prioritising other aspects of the production process - but if the cat is the hero, isn't it time now to deal with this aspect?

* For comparison, at 0s29 the most realistic expression : body static, right up against a vertical surface, then twisted, paws crossed, head partially lowered, tail horizontal, micromovements of the tail tip, ears raised and forward, micromovements left and right, eyes forward, whiskers pulled back. A cat on a mission. One could almost imaging that was rendered from a live-action clip.

Okay good to hear another bit of feedback about the mix. I'm not hyper focused on it but it was what I wanted to focus on today, Mainly because I'd basically done zero work and got zero feedback on the audio mix up until now. Years of work putting together sound and music, But in terms of developing an audio production template that governs levels across the project, This was really the first week that I worked on that. Fortunately I had a lot of previous education and experience with that topic, Nonetheless it's a relief that I seem to have gotten a universally viable mix on the first try. One less thing to dilute my focus on issues such as you've outlined.

I'm a bit confused by your first sentence. The intent was to show that its collar was missing, and to deliver that line in a way that communicated puzzled realization. It's one of several dialog lines i'm not particularly happy with yet, I think I could get a better delivery with that one and maybe do a better job writing the last few. The collar becomes an important plot point later, So that's important feedback since it's essential that I get across clearly that its collar is missing. I'll likely obsess quite a bit over this particular cell because of the way this project works. To a degree The first cell is extremely important, the first 10 cells are very important, The next hundred are fairly important, And I should be able to relax after that. By the time I get to cell 800 only one in 300 viewers is going to ever see that, but every single person who plays the labyrinth will see this cell.

The action in the bulk of this video is simply the cat waking up and exploring the house a little bit to discover its owners gone. I tend to agree with you about that particular scene after the window, And admit that I was just tired after 4 12 hour days in a row and lacked the energy to go on another sidetrack. Most of what people think of in terms of AI is very different than what I'm working with here, An example would be the scene where the tower is revealed, Which took two days of work and still didn't turn out as well as I would have liked. The final shot isn't bad, it does get the size of the tower across. Anyway in reply to your critique I think you're right and what really happened was that I was getting fatigued working on this for so long and when I saw the opportunity for the cat to turn around and jump back down out of the window ledge for better continuity at a cost of probably 6 hours of labor for that shot, I just cut to that scene where it's on the floor again, Which took 15 seconds of labor, a significant difference when one is tired. I'm pretty obsessive about these first few cells so chances are I'll get some rest and go back and put in that time. Anyway solid feedback, good eye.

As to focusing on the cat I will say that up until this point a lot of my focus has been on raising the visual standards of how the cat and world are drawn. I do feel that I have essentially achieved that, And the way this system works it's basically like pouring concrete very slowly. Basically what I'm saying is that I don't have to spend so much time and attention worrying about visual fidelity anymore, And have recently been spending a lot more time working on animation nuances. Progress comes in very small increments but is permanent once established. The nuances of the cat are significantly more detailed than they were in previous versions, But there is still a lot of work that can be done in the future in terms of creating a more realistic and lifelike cat.

To try and explain this in the simplest way possible I would just say that Some of the ai is and their respective training sets are what I would call highly resistant to creating these nuanced cat motions. I'm not going to get too technical here, but essentially the vast amount of training data required for each ai necessitates mass input that biases attention towards popular things. Unfortunately the solution to this particular problem would cost me about a third of $1,000,000 to execute, Which leaves me paddling upstream in terms of the cat animation nuances. Remember those stories about Kubrick doing 80 takes? I've probably done more than 1000 takes trying to get the cat to just sit there and groom itself. So far I have one that worked. I wish I could just drag and drop MP4 files into this interface because that would help communicate things a lot more easily.

All that aside I want to add in a quick reality check here. Take a look at the level of nuance and facial expression that Walt Disney got into a movie where they had millions of dollars, 100 people helping, everyone got paid in advance, they have a live cat, live cameras. I am just one person working with very limited resources. They may have had thousands of times my resources but I would say the cat in this globally successful film is only about 15% more expressive than my electronic cat, if that.

Lastly I would say that perfect can be the enemy of good, And while I'll make every effort within reason to ensure the best possible quality I can achieve, publishing a finished product that people can use and enjoy on some level is always going to be the first priority. Also that shot was not from live action footage, Just an instance where the pipeline worked pretty well.

Here, a picture is worth 1000 words, so I made this quick video showing some of the raw development footage from The next engine upgrade, which will be ready before the project launches and can be grafted in as needed. You can see that it's not quite ready to use yet, But shows significant promise in terms of expanding animation diversity. Right now I've got something that isn't perfect but it does work, I think it's probably as good a time as any to be going into production.

(placed video at top)

CelticRambler · Dec 3, 2024

Nate North said:
I'm a bit confused by your first sentence. The intent was to show that its collar was missing, and to deliver that line in a way that communicated puzzled realization. It's one of several dialog lines i'm not particularly happy with yet, I think I could get a better delivery with that one and maybe do a better job writing the last few. The collar becomes an important plot point later

Ahhh, OK, so I did interpret it correctly at first. Upon re-watching, I assumed it was an "accent" issue, and that the world outside was being drained of colour by the strange new arrival ...

I'm not sure how to easily convey the absence of something that's not necessarily present. All I can think of straight off would be to have displayed somewhere a photo of the cat wearing the collar. Or maybe him being unable to get through a cat-flap that uses a collar-based identification lock.

Nate North said:
The nuances of the cat are significantly more detailed than they were in previous versions

Yeah, I can see in your video response that there's a lot more nuance and realism in the movements.

Nate North said:
The action in the bulk of this video is simply the cat waking up and exploring the house a little bit to discover its owners gone.

This is where I see the greatest inconsistency in the cat's attitude/behaviour. I have the dual (dis?)advantages) of (a) being a very "cat" person; and (b) having too many hours of footage of my own cat exploring the house while I'm away for an extended period.

The one characteristic that stands out above all others in this respect is how so much of that exploration is done with just the head. Cats are very economical with their movements - a slow plod through a room, eyes and ears forward is not normal. A stop-start displacement along the edges, behind/under/on furniture is much more typical. When stopped, though, it's only the body that remains static - everything else will be twitching continuously: eyes, ears (independently, and especially towards the side and back), whiskers, nose, head bobbing up and down, turning left and right, with an intensity proportional to the level of curiosity (see again the snippet at 0m29s)

For this scene, you could include a certain amount of back-and-forth pacing (at the window, for example) to show a degree of agitation (think of big cats' pscyhotic pacing when kept in a zoo), or the cat hearing "something" outside, or behind a door, and going back to the same spot as he tries to understand where his owners have gone.

Nate North said:
a quick reality check here. Take a look at the level of nuance and facial expression that Walt Disney got into a movie where they had millions of dollars, 100 people helping, everyone got paid in advance, they have a live cat, live cameras

Fair point! I think the fundamental problem is that, a lot of the time, cats are more of a "felt presence" than a physical reality and not really much better subjects for video than, say, spiders or inkjet printers.

Nate North said:
Lastly I would say that perfect can be the enemy of good

... the same sentiment I expressed recently regarding Vector_Monster's animation!

sfoster · Dec 3, 2024

I don't have much to add, except there is a subreddit dedicated to cats expressing themselves with ears

https://www.reddit.com/r/airplaneears/

https://www.reddit.com/r/airplaneears/comments/1fxu4ij/i_kissed_her_during_ipad_time

Nate North · Dec 14, 2024

CelticRambler said:
Ahhh, OK, so I did interpret it correctly at first. Upon re-watching, I assumed it was an "accent" issue, and that the world outside was being drained of colour by the strange new arrival ...

I'm not sure how to easily convey the absence of something that's not necessarily present. All I can think of straight off would be to have displayed somewhere a photo of the cat wearing the collar. Or maybe him being unable to get through a cat-flap that uses a collar-based identification lock.

Yeah, I can see in your video response that there's a lot more nuance and realism in the movements.

This is where I see the greatest inconsistency in the cat's attitude/behaviour. I have the dual (dis?)advantages) of (a) being a very "cat" person; and (b) having too many hours of footage of my own cat exploring the house while I'm away for an extended period.

The one characteristic that stands out above all others in this respect is how so much of that exploration is done with just the head. Cats are very economical with their movements - a slow plod through a room, eyes and ears forward is not normal. A stop-start displacement along the edges, behind/under/on furniture is much more typical. When stopped, though, it's only the body that remains static - everything else will be twitching continuously: eyes, ears (independently, and especially towards the side and back), whiskers, nose, head bobbing up and down, turning left and right, with an intensity proportional to the level of curiosity (see again the snippet at 0m29s)

For this scene, you could include a certain amount of back-and-forth pacing (at the window, for example) to show a degree of agitation (think of big cats' pscyhotic pacing when kept in a zoo), or the cat hearing "something" outside, or behind a door, and going back to the same spot as he tries to understand where his owners have gone.

Fair point! I think the fundamental problem is that, a lot of the time, cats are more of a "felt presence" than a physical reality and not really much better subjects for video than, say, spiders or inkjet printers.

... the same sentiment I expressed recently regarding Vector_Monster's animation!

I think one significant aspect of this is that this is a game rather than a movie, or more accurately it's a hybrid of both. Psychologically the experience is quite different but I feel like when I'm demonstrating small samples of the product people are really only getting the movie side of the experience and judging it accordingly, which does make sense. I could be wrong but I'm imagining you don't play a lot of video games. What matters where your attention is and where you get enjoyment from a video game is considerably different than a film. Everything that I can improve should be improved, So it's just a matter of ratios and priorities in terms of how time is spent on those improvements. To me one of the top priorities is to make the game itself enjoyable, And that has a lot more to do with the labyrinth itself than it does with the cat. I think that's no one will be able to perceive that until the actual game is up and running.

Don't misinterpret this as me saying your feedback isn't good, your feedback is actually pretty good, and thank you for that.

I'm simply noting that I feel like perception will be substantially different once players are invested in solving the mystery of the labyrinth.

In a more direct response I don't particularly agree that the missing collar scene needs to be hammered in or spelled out further than it is, But I do agree that your suggestions such as having the cat pace or other achievable signals of mood or motivation could improve the work, And I'll make an effort to integrate that feedback.

Nate North · Dec 14, 2024

sfoster said:
I don't have much to add, except there is a subreddit dedicated to cats expressing themselves with ears

https://www.reddit.com/r/airplaneears/

https://www.reddit.com/r/airplaneears/comments/1fxu4ij/i_kissed_her_during_ipad_time

Thanks I'm actually gonna go over there and study this page, lol. I'll probably learn something.

CelticRambler · Monday at 5:18 AM

Nate North said:
I think one significant aspect of this is that this is a game rather than a movie, or more accurately it's a hybrid of both. Psychologically the experience is quite different but I feel like when I'm demonstrating small samples of the product people are really only getting the movie side of the experience and judging it accordingly

Yeah, that's probably an accurate observation. Without the "game" context, it's easy to look at the clips and see them as "movie" and apply different criteria. On that point, I used to play (and code) video games back in the day, when we had blocks of 8x8 pixels to animate and 16k of RAM to do it with. I kind of lost interest when the graphics on my favourite games became too good (except MS FlightSim) ... and looking over the shoulders of my nephews playing MMORPGs, I feel no desire to pick up a joystick. Somewhat paradoxically (or maybe not) they all love the CYOA book I wrote - and typed out and illustrated - myself because I thought the originals had become too formulaic.

Anyway, back to the cat ...

It occurred to me that this cat in this this context is possibly an example of the limitations of AI (completely outside of SavePoint). If an AI model is learning from images posted on various internet platforms, chances are those images will have been curated to an extent, and most likely to show a "cute kitty" face because that's what gets the biggest "awww ... sooooo cuuuuute" reaction. In reality, an awake, alert cat will more often show the less attractive "airplane ears", and not even as nicely symmetrical as in the reddit photos.

So in regards to the head alone, the AI model is learning from bad data. Then there's the "how a cat gets from point A to point B". Again, if one films a cat moving from point A to point B in real time, most of that time is taken up with the cat not moving, or slinking behind/under a piece of furniture or other view-blocking obstacle. That makes for very unsatisfactory video, and those video clips never get uploaded. To make matters worse, when kitty does decide to move, it's at high speed and not unusual to include a vertical jump that's very difficult to capture. Those clips end up on the metaphorical cutting room floor too.

Taking these (and other limitations) together, when someone asks an AI interface to animate an image of a cat walking down a corridor, we're going to get a cute-kitty face on a cat-shaped body plodding like a human or a dog from the departure point to the destination. Because cats don't walk - they slink/glide/whizz/ooze/teleport ... and the AI just can't get its head around that illogical, inefficient reality (yet!)

mlesemann · Monday at 10:01 AM

CelticRambler said:
cats don't walk - they slink/glide/whizz/ooze/teleport

Or, in the case of our girl cat, prance daintily along the floor on tiny paws, stopping just short of dancing

(the guys race around the house

)

Nate North · Monday at 3:00 PM

CelticRambler said:
Yeah, that's probably an accurate observation. Without the "game" context, it's easy to look at the clips and see them as "movie" and apply different criteria. On that point, I used to play (and code) video games back in the day, when we had blocks of 8x8 pixels to animate and 16k of RAM to do it with. I kind of lost interest when the graphics on my favourite games became too good (except MS FlightSim) ... and looking over the shoulders of my nephews playing MMORPGs, I feel no desire to pick up a joystick. Somewhat paradoxically (or maybe not) they all love the CYOA book I wrote - and typed out and illustrated - myself because I thought the originals had become too formulaic.

Anyway, back to the cat ... It occurred to me that this cat in this this context is possibly an example of the limitations of AI (completely outside of SavePoint). If an AI model is learning from images posted on various internet platforms, chances are those images will have been curated to an extent, and most likely to show a "cute kitty" face because that's what gets the biggest "awww ... sooooo cuuuuute" reaction. In reality, an awake, alert cat will more often show the less attractive "airplane ears", and not even as nicely symmetrical as in the reddit photos.

So in regards to the head alone, the AI model is learning from bad data. Then there's the "how a cat gets from point A to point B". Again, if one films a cat moving from point A to point B in real time, most of that time is taken up with the cat not moving, or slinking behind/under a piece of furniture or other view-blocking obstacle. That makes for very unsatisfactory video, and those video clips never get uploaded. To make matters worse, when kitty does decide to move, it's at high speed and not unusual to include a vertical jump that's very difficult to capture. Those clips end up on the metaphorical cutting room floor too.

Taking these (and other limitations) together, when someone asks an AI interface to animate an image of a cat walking down a corridor, we're going to get a cute-kitty face on a cat-shaped body plodding like a human or a dog from the departure point to the destination. Because cats don't walk - they slink/glide/whizz/ooze/teleport ... and the AI just can't get its head around that illogical, inefficient reality (yet!)

You're getting closer. That's actually a really excellent guess and you've got it about 80% right already. It's actually pretty complicated and there's a lot of elements to this but some of the significant factors are kind of what you said.

When it comes to training the large animation models which will not fit on civilian computers (It's the one part of the pipeline that I can't just build and run locally),

A. Just as you said people are picking and choosing what scenes they upload of their cat and it really doesn't paint a comprehensive picture that would be useful in a realistic portrayal of generic activities.

B. The space inside any given ais head is kind of divided up like a pie chart at least for this type of model. A lot of factors go into how that pie chart is divided and some are more heavily curated than others but due to the massive amount of data required to assemble one in the first place many are not particularly well curated. What we get instead is kind of an unofficial map of the Aggregate human mentality. Just eyeballing it from my experience I'm going to say we're looking at brains that are composed of 95% hot girl and 5% every other topic in the entire world. You may think that some kind of comic exaggeration but if anything I'm probably understating what's going on here. I go to pages where the general public is putting their ai work on display, And sometimes you have to scroll for three or four pages before you see a single image in a grid that isn't a picture of a hot girl. If you need to draw a picture of an apartment building you're going to have to do a search for that in the search bar, it's not something you would come across by scrolling, Same for a shoe or car. I like attractive women as much or more as the next person but I can't help but think these people are kind of idiots when it's literally the only thought they are ever capable of having. Sorry I'm ranting a bit here this particular thing has actually made my job vastly more difficult, And it would be super helpful to me personally if the world could just, you know, grow up a little bit. I love Snickers bars but I don't eat them breakfast lunch and dinner every day until you can't find anything in my house except Snickers Bars, wtf is wrong with everybody.

C. The size complexity and computing power of the animation models specifically Creates a situation where there's zero access to modify them in any way. Regardless of what you needed to use an ai for this is what you end up being stuck with. That may change in the future and I've spent a lot of time working on it personally with varying levels of success. Bottom line is that to Have complete control over that part of the process would currently cost me millions before it was done with. Perhaps that situation will arrive at some point as it has before, But at present I'm stuck with a system that really is not designed for my needs. The pipeline I've built circumvents these issues to a certain degree. Every pixel you see on screen was built by my code on my system and I only use those external models to animate things that are already exist in my own system, But the animations which is what you were complaining about are to a significant degree out of my hands. I can coax I can push I can cherry pick and so on but what I can't do is communicate to these external supercomputing ais that they need to create nuanced and realistic cat movements. I have tried literally thousands of times but the success rate is near zero. Bigger and better animation models can solve the problem as my 5% slice of the brain becomes bigger as the entire brain becomes bigger, But I won't really be able to do what I want to do here in terms of animation nuance until I can gain control over that stage of the pipeline in the way that I have it over every other stage. It would be about one second after I gained that capability that I just built an AI brain that was 100% cat, And even at the current tech level it would blow your mind what that would look like.

Just a note in here for people following this endless saga, I've made an important breakthrough this week that doesn't affect visuals or sound at all, And we'll be announcing it soon in the post about save point version 9.0. This one's kind of a big deal.

Outside of the Box

IndieTalk's Resident Guru