CineMordor: Lord of the Re-Gens

Since my last writing on the matter, AI creation tools have evolved a fair bit- virtually everything from that post in mid 2023 still applies, but one format that has leapt from not worth talking about to, ooh whoa, I need to get the hang of this is Generative Video.

The stuff that really turned my head got me to suit up comes from demonflyingfox, who broke out on TikTok with iconic Balenciaga-meets-Harry Potter / Breaking Bad / Star Wars / Etc. and is currently working on a series of redneck themed trailers for movie & tv classics. Next to him, Abandoned Films really produces some amazing content of marquee franchises set in within a “1950s Super Panavision” motif. In addition to the visuals, the scripts, voiceover narration, and soundtrack for these shorts are very well done.

Following up from an initial test of video ai using Harambe in various scenes from a “perfect first date,” I wanted to try something a bit longer and more challenging, along the lines of what these guys were up to.

Using their theme-mash as starting point, I proceeded to think of some concepts I could try. Very immediately into this process, for whatever reason, it just hit me- “What about Led Zeppelin? They literally sing songs about Lord of the Rings and Norse Mythology – the visual marinade is right there.” Also importantly, to me, was that most of their catalog comes from a time before music videos of any sort were a thing, so it just felt… appropriate, giving some state-of-the-art sauce to these legends from the past.

I set about clipping out the one minute section of “Ramble On” from the band’s 1969 album, “Led Zeppelin II,” and began building the project out from there. Fast forward and here is the result:

^{Ramble of the Rings}

“What software did you use?”

That’s the big question, right? As of this writing, I have more questions in my DMs about “how-to software?” than I have likes on the video. That’s a first. But I get it, this is uncertain, uncharted territory, there are dangers, and there are consequences for planting a flag and getting comfortable in it. And many people think the landscape is straight-up evil, and see clouds of doom in its advance. And as such, using Tolkien as pastiche for my first major project seems accidentally poetic in hindsight. Anyway, to answer the question, there is no ‘one ring to rule them all’ as far as apps or programs. You’re gonna need to collect a lot of jewelry to make this magic.

The first thing you need is an image generator, because text-to-video, in its current state, is not great. I mean, it’s amazing that it works at all, but for practical application, its just not what you want to be working with.

Instead, to get videos that look like “whoa,” you want image-to-video, which gives the video generator a much clearer understanding (a picture is worth a thousand words kinda thing) and then it will proceed to populate subsequent frames.

For this step, personally, I enjoy MidJourney– been a member for 2 years now and part of their early access program. I like the vibe, I like the styles that their generator skews toward, the community is robust, and the creative and ui tools are always updating.

But that’s not to say that you can’t use whatever you like/have on hand. Dall-E 3 if you’re a GPT subscriber, SD, Flux, Leonardo, ImageFX, everyone has a favorite, and some are certainly suited better to certain tasks (for instance, Dall-E is far better than MidJourney at creating design and iconography elements). The choice is yours.

Your second ‘ring’ will be photoshop (or some other robust image editor if you’re not already indentured to Adobe) for touch ups, crops, etc- because, as I covered last time, you never get exactly what you want, and masking, merging, fixing wonky hands, weird clothing elements, and other unwanted bits, is just… part of the dance.

Then lastly, because I wanted not just “a rock band,” but the actual members of Led Zeppelin, I used Stable Diffusion to in-paint their faces over the random rockers that I was getting out of the box.

While that’s all happening, load up ring #3, aka your video editor: premiere; davinci; final cut if you’re a child; capcut if you like having your data stolen. Then lay these images into your edit as a storyboard, building the flow, checking what matches incoming and outgoing shots, your overall aesthetic, etc. In my case, I was going for a retro, shot-in-the-70s kinda look, while also incorporating a bit of the Peter Jackson Trilogy to anchor the visuals to a more recent cultural touchstone.

_{^{(from left [both images]: John Paul Jones, John Bonham, Robert Plant, Jimmy Page)}}

Once you have a basic string-out of stills overlaying your audio, it’s finally time to put the images in motion.

There’s a lot of options for your fourth ring. RunwayML is kind of the creative standard, its good, its fast, and relatively cheap for users who generate a lot of content. Then there’s Sora, a ChatGPT companion under the Open AI umbrella, and that, like everything else Open AI offers, thinks it’s way better than it actually is, and is priced insanely high. If you already have a pro GPT account, i’d say go for it, it is good, but everyone else, move along. Personally, I moved along all the way across the ocean, to China, where pretty much all the best AI is happening. It’s sort of like, in the late 1980s – early 90s, When Japan was just sooo far in front of the pack on consumer electronics, that’s how China’s AI products are today.

Of these, I found Hailou to be very useful- you get enough free signup credits to actually swim around a bit and try some things out, its very fast, and good value on subsequent credit purchases. So I used that, as well as Kling. Compared to it’s peers, Kling is excellent at deciphering the physics happening in your scene, and gives back the most physically accurate results (currently). The downside, is that it is also slow, and it is not cheap (although it is less expensive than Sora, which is just… wild- like someone please get Sam Altman off the podium of American AI “innovation”, its embarrassing- the Chinese and the Germans are objectively ahead, and Kenyans did all the work of tagging the billions and billions of data tokens to make any of it possible in the first place)

Ok, where was I? Oh, right- Kling. So yeah Kling i’d say… is to be used akin to like, when your on set and the whole shoot day is just gonna be one or two big shots, but then all of the simpler stuff you roll with Runway or Hailou so you can ‘make your days’ and move on.

Speaking of moving on, finally you will now have videos to lay in on our audio, and begin building your edit toward the final output. I also did stuff with lipsync, but its too much to get into here- The good news is that it probably doesn’t require another software beyond the olympic flag collection that you’ve just accumulated. However, a lot of help and added production value can be had by using something like Topaz to up-res or slo-mo your generated videos, while using After Effects to add VFX can be preferable to rolling the dice on what FX a generator will give you.

And that’s basically “it”, although, be aware these notes are the equivalent of sending you into the wilderness with a just a map and some lembas bread, but “you’ll be fine”…?

Best Practices

Seeing your first image “come to life” is really special, and very motivating to move forward. But soon your dreams may begin to collide with barriers of what’s possible in the practical sense. Interaction with objects is number one on the list- it’s not like broken, but just something that needs to be accounted for, like you don’t want to be in a situation where it’s the main part of your project, shot after shot. Then comes character interaction. Talking, looking at each other, standing, walking, -not a problem. A fist fight? …let’s just say it’s going to look very AI.

And in general, the more movement there is, the more unstable it gets, whether it’s the subject moving around, or blocking elements passing in the foreground, or camera movement. Trying a whip-pan pretty much equals yeeting your start-image-defined world off into the void, and arriving in an alternate dimension that exists solely on a random interpretation of whatever prompt you provided. Meanwhile, a character who turns away from the camera and then back to it will have a new face. And a face that moves around the frame quickly will begin to change. (For those of you who have ever done motion tracking in AE, its exactly like working on a dynamic shot and the tracker starts to slip…)

Also, a lot of your credits/tokens/dollars will get wasted on re-runs from busted camera movement. Like most of us, when asked on the spot, the generators seem kind of unsure which side their “left” and “right” are. Then, sometimes you’ll request a push-in and receive a locked down shot, or a pull-out, or a pan. Sometimes you’ll want to hold on characters and they’ll just walk off frame, … to their trailers I imagine, frustrated that we’re “gonna reset and go again” for the 15th time.

Lastly, you may begin to find the image->video trick to become really confining, once several of the results are put in sequence. For me, the pictures that I generated to begin my sequence often times felt more like the mid-point of a real-world take. There are tools trying to address this, some generators offer first/last shot for you to send up and then it will decipher the in between, but there again will come in your photoshop wizardry to try and cutout and storyboard. and note that when you booked your shot with two defined images, the generators really aggressive -if you just try to, say, cut out a character, move them to one side, call that your ‘start’, and then move them again and maybe scale up, so it appears that they are angling towards camera- your subject will move, but everything else in the video you get back will be eerily still, because every other pixel of your shots are identical.

Final Thoughts

Truly, I had a ton of fun. I mean, coming up with ideas for a 70’s rockband to just vibe out in Middle Earth… come on! So much fun, and I have many, many unused shots that I adore, and could easily put together an alternate cut with the scraps.

I also enjoyed the creative challenges, some mentioned above, working around the limitations, coming up with tricks to get more complex shots than otherwise possible.

But the reception has been brutal. People do not like AI (except, for the people who do). you see it a lot in the top comments on various toks & reels, and certainly there are a ton of abuses currently, fake disaster videos, press conferences, and whatnot. But the disdain out there is more than that.

To me, this period feels a lot like the very early days of photography itself, when people would decry the medium as a shallow, lifeless mockery of the art of painting, while others claimed that the imprint made on the celluloid behind the camera lens stole your soul. And yet it was too useful, and too accessible, to deny, and from that, yes, painting declined as a whole, especially for commercial purposes, but in the new world of photography, light became the brush, and the torch of artistic expression carried on, as artists found ways to create emotional expression through lighting, angle, depth of field, etc, etc.

Cinema, which today stands as a pillar of our shared experience, began as gimmicks where folks paid a couple coins to look through a spot in a little box, and a wheel would spin sequential images of a man jumping on the back of a galloping horse. The kinda stuff that was ‘neat’ but, would never be considered “art.”

Right now I can’t currently point to an AI image or video and say, “fuck yeah, man, that’s the ‘Starry Night’ of our time!” or “Citizen Kane wishes it was this good, bro.”

But the current system in charge of producing narrative content is so calcified in its structure; so expensive to operate; and so maximized for safe, established, brands; that a whole generation is now cresting their 20’s with literally zero zeitgeist films to call their own- they have grown up in a world only serving reboots, remakes, and other extensions of their elders’ favorite things. It sucks, and therefore it really shouldn’t be surprising that so many Zoomers have turned to content like Skibidi Toilet and other outer-sphere media as their generational standard bearers.

So to bring it around one last time to Lord of the Rings references, the landscape that we’re in- it’s already cracked and barren, and those clouds of doom above, they might actually bring some rain and help nourish an abundant harvest cinematic riches for future generations.

CineMordor: Lord of the Re-Gens

“What software did you use?”

Best Practices

Final Thoughts

More Stories

A Plunge into “Prompt”

Microsoft Innovations: Empowering the Mobile Experience