software Is there an AI for cutting, framing and switching camera?

Hey, people!
I'm learning about video editing and saw many advances in it lately with AI.
I was wondering if there is an AI that can take multiple videos with different camera angles and put them together in a single clip, choosing when to switch among the angles and also making frame transitions and adjustments. Does anyone know about it?
Thank you! ✨
 
No such thing currently exists. Well, something vaguely fitting that description exists, It kind of depends on what level of editing you want it to do. I think there is something out there right now that will switch angles on a multicam automatically in an edit using the audio for continuity. I don't remember specifically where I saw that software but it's essentially for Pro studio situations like a podcast that's being filmed for youtube.

In terms of any type of creative video editing, we're really not there yet and no models have been trained on that task to my knowledge. It would be extremely useful to me personally if there were, And I could probably build one but it's a huge job that would take years for just that one thing (To make it really work).

First you would need an AI that can watch videos and comprehend what was going on in those videos. Right now what's possible is for me to give a system a frame or instructions on how to retrieve a frame from a video, And in essence it can tell me what objects are in that video.

Here is a link to an implementation of that existing technology online so you can see what I'm talking about and even try it out yourself for free.


So basically this is where we're at right now, We can have ai look at a frame and tell us what's in that frame. It's actually very useful and I use it in my pipeline for several tasks.

To accomplish what you're talking about, a completely separate ai would have to be built based on patterns established by analyzing scene content and timestamps from frame to frame in finished professional edits. E.G. I feed the algorithm three frames from the beginning middle and end of an individual clip. In frame one it sees two men and one gun, And the middle frame it sees two men one gun and one explosion, And the last frame it sees one man one gun and one corpse. I would then tell the ai that this sequence probably meant that one person had shot the other person. Then you do that ten thousand more times in a row with different situations. Armed with this knowledge you would set it loose on like 1000 movies, And it would try to get a feel for how long shots were typically held for various situations. anyway it's all incredibly complex, time consuming, expensive, etc, So you might have to wait a while for it to be available.

Without an extensive world model it would work, But not very well. That would mean coding it to output in a format that would be easy to do secondary human editing on. That alone would be a significant programming task, And would only work on one target platform, Such as da Vinci Resolve or Adobe Premiere if you're "from the 90s".

In short it's a possible but very complex task to accomplish what you're talking about. It will absolutely happen in the future probably sooner than later, But it's likely to be bad almost to the point of being useless at first like almost every ai, And I suspect it will be a few years before we have the first decent version.

I thought a little bit about creating a much simpler version of this tech for my own uses, Which could make music videos by simply detecting the bpm of a piece of music and then cutting a series of clips to 4 bars, 1 beat, etc. Some division of the tempo. You could do something simple and effective there such as have it extract one frame from each clip in the middle, And then kind of give it keywords that should appear more in certain sections of the video. E.G. A music video is three minutes long, I tell it to- Gravitate towards clips of just a car driving down the road in the first minute, Two cars engaged in a gunfight on the highway in the middle, Multiple police cars chasing the two cars in a gunfight near to the end, And clips of a car exploding gravitating towards the final seconds of the video. That process might produce a decent video cut to the tempo...... Sometimes.

In conclusion this tech is absolutely possible and will certainly arrive time within the next few years. Several ai companies now have hundreds of millions of dollars in research funding, So I wouldn't be shocked if it was on the sooner end.

If you see an instance of it up and running somewhere before I do, please feel free to tell me about it by responding to this thread, And I will do likewise.
 
Last edited:
Back
Top