• Popcorn Points determine how popular a video is. You can click the popcorn bucket or simply react (Like, Love, etc.) and it will register a vote.

Narration Test - The Red Mansion


So this is something I had intended to try out for a while. With no voice actors available, I had planned to do this as a non narritive project, but as the scope expanded, it felt like choices would become repetitive without the context that a narrator could provide.

With next gen nueral based voice AI, I thought I might have an option, but almost every test was a negative. Until I found one single voice, on one service, out of hundreds, that actually kind of worked.

In any case, I did a test run of it here in this cell, so I could get some feedback. Can you tell the voice is artificial? Would you have noticed if I didn't mention it. Does the minimalistic narritive help immerse you in the environment? Do you think you would enjoy the product more or less with the narration?

I plan to keep it sparse even if I do go in this direction, and the big reason I'm leaning towards it is because I think I can make the choices more meaningful this way. I was up for the challenge of a 100% non verbal film when it was supposed to be 30 cells, but at 400+, it makes some sense to add another layer to make things easier.
 
Upvote 3
Still sounds mechanical. The inflections are repetitive and predictable. It’s certainly impressive where AI voices are now, but they still lack the ability to dramatize. All tell-tale signs of AI voiceover.

Side note: as the video progresses, the music gets louder and begins to drown the narration.

I’m also surprised you can’t find a voice actor. I know the market is flooded with lots of wannabes (thanks, Fiverr), but there are still lots of good, real voices out there.
 
I wouldn't want to hear the kinds of words that are being narrated.

*BONG* "the cat hears a bong"
i mean yeah im here and i have ears too, i just heard the bong lol i dont need it to tell me what i heard with my own ears

and then describing the way the rug looks like im blind or something.
maybe if the narration actually added a layer ? right now it seems like it added a redundancy
 
Last edited:
Still sounds mechanical. The inflections are repetitive and predictable. It’s certainly impressive where AI voices are now, but they still lack the ability to dramatize. All tell-tale signs of AI voiceover.

Side note: as the video progresses, the music gets louder and begins to drown the narration.

I’m also surprised you can’t find a voice actor. I know the market is flooded with lots of wannabes (thanks, Fiverr), but there are still lots of good, real voices out there.
Welllll, it's not exactly that I can't find wannabe voice actors. It's more that unpaid beginners have issues with unreliability, inconsistency, recording equipment, file transfers, background noise, and on and on. I've tried this out with a few. I think all of it could be fixed with a budget, but trying to do it sans funding at 400 units a year, I need the voice actor to be highly consistent and reliable over a long period, like a robot, lol.

I do agree with you though, this is the best of the best robot voices I could find, and it's still noticeable. I guess the real question is, is it worse with the robot voice than with no voice and just trying to do the whole thing visually?

Ultimately, It would all have to be replaced with actual human voices anyway, because youtube auto demonitizes anything using a robot voice.

I appreciate the feedback. You are definitely correct about the dynamics. For a release cell, I'd test volumes throughout, in the case of a quick demo, I just use the meters at some average point and hope for the best, and it causes stuff like this to happen. The sound the cat hears is also terrible, and will have to be replaced with something better produced.
 
I wouldn't want to hear the kinds of words that are being narrated.

*BONG* "the cat hears a bong"
i mean yeah im here and i have ears too, i just heard the bong lol i dont need it to tell me what i heard with my own ears

and then describing the way the rug looks like im blind or something.
maybe if the narration actually added a layer ? right now it seems like it added a redundancy
Yeah, I know what you mean. Just trying to quickly dub in something that fit to test it. My theory was that different people process stimuli different ways. In the actual story, I'd want to use it in more meaningful ways, but this was just a blank cell, not yet connected to any plotline.

What I'm really thinking is that the choices can benefit from the narrator revealing more about them. The youtube buttons only allow a small amount of tiny text.

So a decent choice might be "it was beginning to get hotter, and he could feel himself becoming dehydrated, he could see a cavern offering shade down the left fork of the ravine, but if he travelled over the dunes, he might find whatever source of water was sustaining life in the valley.

Point being there are things I can quickly and cheaply say there, that would be difficult or time consuming to try and show, such as the cat's dehydration, the fact that the temperature was rising, or the thought process about finding water evidenced by the existence of local fauna.

My thinking is that non verbal was a good fit for the original design, which was supposed to be 30 cells, but now that it's 400+ target, the same design may not work.

and yes, that bong sound is bad. I knew it wasn't final, so I tested like 5 sounds and tossed it in. For comparison, in that video with the car driving down the street with people, I went through over 300 sound files to get the right music alone.
 
Last edited:
I think an AI narrator would be fine if you kept it really simple, like "your cat is getting hot!" instead of trying to tell a story with voice
Story telling and inflection is not the AI voice strong suit, its better for stuff like reading closed captions if you had some text alert pop up about the cats status
 
Excellent! The music starts to cover the narration at :34.

I liked the voice. And the style seems to be like reading a children's book to a child, describing noises and such. The voice may have a touch too much effect on it, with some resonating "sss" sounds like if you listen to "ornate rugs(tisss)" there is a tinny sssss. Other than that, perfect.
 
Excellent! The music starts to cover the narration at :34.

I liked the voice. And the style seems to be like reading a children's book to a child, describing noises and such. The voice may have a touch too much effect on it, with some resonating "sss" sounds like if you listen to "ornate rugs(tisss)" there is a tinny sssss. Other than that, perfect.
there is a de-esser for that or whatever you call it, that he could try
 
Probably just too much effect, I am guessing he gave it a "hall" type sound to match the scene which I don't think is a good idea since the scenes will all change, I'd go for that closer storytelling effect.
 
I think an AI narrator would be fine if you kept it really simple, like "your cat is getting hot!" instead of trying to tell a story with voice
Story telling and inflection is not the AI voice strong suit, its better for stuff like reading closed captions if you had some text alert pop up about the cats status
My aim, once I quit testing this and actually start crafting the verbal stylistic, would be to make it very minimal, but not quite to the extent you're describing. My initial vision was a single, kind of poetic description at the beginning of each cell, something that could help set the tone.

"The desert seemed to stretch on forever, and he wondered if he should start travelling by night, until he at least knew where to find water"

I've even considered using haiku as a format for these VO sections, could be interesting as a stylistic.

(made using online haiku generator)

Simmering desert
A lost, solitary cat runs
beyond the cacti

There's a number of ways to do this, but no matter which I use, it's going to be minimalistic. I don't think any one cell should have more than 3 VO segments.

The reason I won't do status or hud style updates, is because it's a core design thing to make this feel fundamentally different than mainstream video games. I want to really emphasize the creative options that are available in this format. In fact, that's why you are already seeing so many locations. I don't have install size limits, so this UE code base is half a terabyte already. You couldn't publish something like that to a console that comes stock with a 450 gb HDD. So in short, I don't want to give them anything too familiar. The audience should never feel like, "oh I get it, this is just like "X""
 
Last edited:
My aim, once I quit testing this and actually start crafting the verbal stylistic, would be to make it very minimal, but not quite to the extent you're describing. My initial vision was a single, kind of poetic description at the beginning of each cell, something that could help set the tone.

"The desert seemed to stretch on forever, and he wondered if he should start travelling by night, until he at least knew where to find water"

I've even considered using haiku as a format for these VO sections, could be interesting as a stylistic.

(made using online haiku generator)

Simmering desert
A lost, solitary cat runs
beyond the cacti

There's a number of ways to do this, but no matter which I use, it's going to be minimalistic. I don't think any one cell should have more than 3 VO segments.

have you considered first person? it's simpler wordage.

"This heats make mes parched! should I be traveling by night? it seems to stretch on forever..."
 
Probably just too much effect, I am guessing he gave it a "hall" type sound to match the scene which I don't think is a good idea since the scenes will all change, I'd go for that closer storytelling effect.
All these things you guys are saying about the audio are correct, but there is a simple explanation.

I didn't work on the audio at all, lol. I dropped some sound files into the mix as fast as I could, and didn't mix them. I did a quick EQ on the voice, threw a plate reverb preset to feather it in a bit, and that's it. So basically audio was an afterthought, and the plan is to develop detailed presets, use careful foley selection, and mix and master properly, once I've nailed this down to the point where I don't think I'll just be throwing it away.

Basically, if this was a production cell I would have spent about 90 minutes on the sound, but for a quick test, I spent maybe 5 minutes.

It's good feedback though, it helps me get a read on what level of detail different people notice. If I can justify the time expenditure, or simply make it fast and easy, I can actually do acoustic matching to the shape and size of each space. I think though, that IT's solution of just doing a more standard studio verb narration track would be more efficient over the long haul. I'd probably be looking at about 1200 of these before it's over, so a couple extra steps per instance can add up.
 
have you considered first person? it's simpler wordage.

"This heats make mes parched! should I be traveling by night? it seems to stretch on forever..."
It's a valid idea. Something I'm interested in is developing the persona of the cat. I think I could help the audience bond with it in this way. I'm not nuts about first person, but your thought process is dead on, exploring unique options, trying them out, and just seeing where the chemistry is the strongest.

In terms of the cat's personality, if that comes into play through this narration, I only have some broad strokes. I think it's just an animal surviving, not cute or silly, but occasionally funny, sometimes cunning, perhaps a bit relatable.

I'm thinking about a mild humor element based on misperceptions of the human world. It's been done before, but it can be somewhat charming if properly executed.

"he watched from the shadows, analyzing the pattern, looking for an opening to cross. Every few hours, the car beast would open up one of it's terrifying horizontal jaws and swallow one or more of the humans. After feeding, the car the car would immediately go out to roam, perhaps in search of additional humans to devour. Regardless of why, this was his opening, once the car had gone and was no longer guarding the apartment, he could make his way inside"
 
This may sound crazy, but I'm considering just giving the can Mike Armentrout's personality. Kind of tired and wise, and totally in control of outlandish situations is a listless, bored way. It's just such a different kind of protagonist compared to the energetic, idealistic heroes of mainstream storytelling. Pragmatic, and dry, with the humor pivoting on the straight man dynamic.

Right now, I'm just trying out all sorts of ideas. BTW, I can totally see a version like you outlined, with the animal speak working. I don't think it's the direction I want to go, but I do think it would work. The Gollum character really worked with audiences, better than I would have imagined really.
 
Superb Nate!, Gosh i wouldn't have noticed about the woman's voice being computerised!, Eye catching, great crystal clear,
voice, smooth production in all area's, Zey The Mouse says 'Hello' to the Cat, lol, ;)

regards :)
 
I liked the narration overall, just sounds perhaps a little stiff but still good. For the cat being thirsty, like what if the sound design changes only when it needs water; the water sources being more amplified while walking past or near one? Cats sweat through their paw pads. What about adding a panting/slower walking animation, with a red 'aura' around the feet? Or if when hungry, mice or bird sounds are enhanced?

But cats do speak, just not human english. (Well, like dogs, not usually lol) They have thousands of vocalizations meaning different things. Ever see a cat 'wanting' a mouse it sees, or a bird out side? Some cats do a 'waa wah waa wah' mouth sound/movement. What about adding a sound like that, and an icon of food, a fish icon or whatever fades onto the screen? Hope progress is going well for you, I know this is a bit of an other post.
hqdefault.jpg
 
Last edited:
That was a meritoriously observant cat, and conscious too...

There was a dilemma as to whether he should go for the opening, or follow the voice inside

until I saw this,....

"he watched from the shadows, analyzing the pattern, looking for an opening to cross. Every few hours, the car beast would open up one of it's terrifying horizontal jaws and swallow one or more of the humans. After feeding, the car the car would immediately go out to roam, perhaps in search of additional humans to devour. Regardless of why, this was his opening, once the car had gone and was no longer guarding the apartment, he could make his way inside"


perfectly completed the message, and delivered.

Brilliant.
 
Back
Top