Good To See You!
Welcome to the second issue of Theoretically News! I’d like to thank all the new readers for joining in—sorry I mentioned it like 100 times in the videos this week!
But we’re now 1,300 readers strong in just a week! So, I just wanted to take a moment to thank every single one of you.
Okay, enough waffling. Let’s get to the News and the Prompts I promised you!
NEWS
The “Sora” of Music?

source: @btibor91 on X
Sonata Spotted! Some interesting hostnames were spotted on the OpenAI domain, and while there is obviously no confirmation from Sam & Co., “Sonata” certainly sounds like a music generator, doesn’t it?
We DO know that OpenAI has been working on a Generative Music Tool. Their last foray into this area was Jukebox back in 2020, and—well, that was the era when AI Music sounded like an AM radio haunted by angry ghosts.
While we still don’t have any details on how OpenAI’s new music generator will function, I’m willing to wager that, at this point, we know what it’ll be named!
Why It Matters: With Suno and Udio’s legal teams still digging themselves out from under mountains of RIAA paperwork (forcing them into restrictive “label-partnered” pivots for 2026), OpenAI is poised to strike.
We also know that as of November 2025, Universal, Warner, and Sony Music were finalizing AI licensing deals. Given the bombshell news of last month’s Disney/OpenAI partnership, Sonata might just launch as the first model built on contracts rather than the fair use argument..
NEWS
Disney vs. Google: The "Order 66" Update

Speaking of Disney, there is some interesting news regarding their scuffle with Google, which was suspiciously timed within minutes of the OpenAI partnership announcement.
Disney’s 4 demands of Google were:
Immediately halt copyright infringement
Make tech changes to discontinue IP exploitation
Stop using Disney content to train Google AI models
Disclose what Disney content was used to train AI models
Now, four weeks later, according to a report from Deadline, Disney’s lawyers have listed 66 YouTube videos that have been removed for allegedly using Disney IP via Veo or Imagen.
My Take: Order 66 jokes aside (that one’s for my Star Wars nerds), 66 does seem like a pretty low number, doesn’t it? While Disney is signaling a “symbolic win,” aside from those 66 struck YouTube videos, things seem to be business as usual on Veo and Nano Banana.
As 2026 continues, we are sure to see more (light) saber-rattling and lawsuit threats. However, ultimately, I think it’s clear that the goal is licensing more than anything. Disney doesn’t want a long war with Google, and vice versa (with some speculating that Disney couldn’t afford it to begin with).
At the end of the day, you know my advice:
Don’t play with other people’s toys. Make your own.
New Model
New Open Source Image Model From Z AI

GLM-Image is a new 16B Open Source model recently released by Z.ai.
This model utilizes a hybrid auto-regressive and diffusion design (9B AR + 7B Diffusion) which allows for image generation with reasoning. It essentially does that thing where the model goes back and checks its work, functioning closer to closed proprietary systems like Nano Banana Pro.
In practice, at least… to be honest, while I haven’t had time to fully dive in and test it, many in the community have found it to be somewhere between OK to… well, this:

Generated by @janekm
So, despite Z.ai’s claims (and some leaderboard scores) stating that this model dethrones Nano Banana, Qwen, and Seedream, I don’t think it’s quite there.
That said, this does mark two Open Source image models this week, and that’s always welcome news.
Here’s the rub: The price tag for the reasoning capabilities is quite high. Generating a 2048×2048 image on an H100 GPU (which costs around $25k to $40k) takes around 252 seconds.
With base system requirements at ~23GB of GPU memory, you can probably get it to work, but it’ll take a while!
PRO TIP
ADD MOTION BLUR

This week’s PRO TIP comes in from AI Warper, who advocates for applying a touch of motion blur to your start frames in Image-to-Video.
These models are trained with an extracted start frame + the video clip... NOT YOUR FLAWLESS AI START FRAME
Add motion blur manually to the start image.
Check out the final video output comparison Warper provided, generated in Luma Labs Ray3. It’s pretty compelling!
And while you probably shouldn’t add motion blur to every shot, it certainly seems worth experimenting with for use cases like this.
COOL THING OF THE WEEK
Shotdeck

A little bit of a cheat here, since I went over Shotdeck in one of this week’s videos! Generally, the idea with the newsletter is to talk about things that AREN’T in the videos, but…
Shotdeck really is the coolest thing I’ve seen this week!
Beyond all the technical information that is provided on hundreds of shots in thousands of movies, as I’ve been browsing, I’ve run across tons of images in the “related shots” sections where I’m like: “This looks amazing, what is this?”
Click in, and I see images from movies I’ve never seen, or even heard of.
My “To Watch” queue is rapidly expanding!
So yeah, while Shotdeck is an incredible informational source, it’s also a stellar INSPIRATIONAL one.
FROM THE STUDIO
What I Covered This Week

Fun tutorial video featuring my latest workflow and how you can use Shotdeck to influence a variety of factors in your outputs! I’m really proud of this video, and I was happy to show the gang at Flora some love!
If you missed it, check it out here: https://youtu.be/vRNHNNliDVM
And yes, those promised prompts are down below!

I also took a look at PixVerse’s new R1 Real-Time Interactive AI Video Generator. This was a lot of fun. Is it ready for cinematic masterpieces? It is not.
Is it stupid fun and did it make me laugh until my face hurt? Yes. Yes, it did.
Additionally, we took a look at Flux 2’s Klein release, and some quick hit updates from Runway and Google!
A Quick Correction: In the Runway section of the video, I lamented that we have to manually crop the Story Panels (which generate as vertical stacks). As it turns out, there IS a tool for that.
Runway has a Panel Upscaler app that automatically separates and upscales your Story Panel frames!
If you want to check out the full video, it’s linked here: https://youtu.be/SAjKSRRJstQ
The MEGA PROMPT!
CINEMATIC PROMPT TEXT!

Wall of Text Incoming! As promised, here are the full prompts I used in this week’s Cinematic Prompt Technique video, along with some additional context:
For our 2×2 Nano Banana Pro images, you’ll want to paste the below into any LLM, along with your reference images of characters, locations, or props.
It’s important to remember the token limit I discussed in the video, so don’t overload things with 30 reference images! Stick to two or three, and use the “smartest” LLM you can get your hands on!
Don’t forget that you need to input your direction on the top line!
USER SCENARIO INPUT (user writes only this)
[ONE OR TWO SENTENCES: what is happening in the moment. Action/intent only.]
SYSTEM INSTRUCTIONS (model must obey)
HARD OUTPUT
Output ONLY the final image-generation prompt. No headings, no analysis, no brackets, no “template” text. Do not mention reference images.
REFERENCES (FAIL CLOSED)
If you cannot access the attached images, output exactly: MISSING REFERENCES
LENGTH (CRITICAL)
Final prompt must be 2,200–3,000 characters total (including spaces). If too long, compress by removing redundancy. Never remove the 4 frame lines or the Grounding Block.
SOURCE OF TRUTH
All appearance + environment details must come exclusively from the attached images. Do not invent wardrobe, props, architecture, or lighting sources.
Scenario elements not visible may be described only generically (no design specifics).
CONTINUITY
All four frames show the same frozen instant, different angles/distances, unless specified in Scenario input section.
FINAL OUTPUT FORMAT (follow exactly; keep it tight)
A) HEADER (must start exactly like this)
Generate a photorealistic cinematic 2x2 grid of still frames from a live-action film depicting [scenario summary from user input]. The imagery must feel grounded, restrained, and emotionally heavy. Realistic lighting, real materials, deliberate camera placement. No animation style, no painterly rendering, no exaggerated fantasy glow. Real weight, real consequence.
Also specify: 2x2 contact sheet grid, four equal panels, thin borders, no text, no captions, no watermark.
B) FRAME LIST (must appear immediately after header; 1–2 sentences each)
Frame 1 (WIDE): [establish space + subject placement from the location image]
Frame 2 (MED/OTS or PROFILE): [maintain screen direction + scale; show relationship between subject(s) and threat/action]
Frame 3 (CLOSE): [reaction/detail; micro-expression + tension; same instant]
Frame 4 (REVERSE WIDE / ALT ANGLE): [reinforce continuity; read environment]
C) ENVIRONMENT (1 short paragraph, 3–5 sentences)
Describe only what’s visible in the location image: layout, materials, wear, moisture/haze, practical light sources, atmosphere.
D) CHARACTERS (1 short paragraph total unless multiple distinct subjects, then 1 sentence each)
Confirm only visually supported traits: silhouette, key clothing layers, materials, wet/dirt/wear, hair shape, defining facial features. Keep minimal.
E) ACTION (1 short paragraph, 2–4 sentences)
Translate the user scenario into physical posture/tension/gaze in the same frozen instant. No new events.
F) TECHNICAL (1–3 sentences)
Only if supported by the reference look or metadata: lens type/DoF behavior + lighting direction/falloff. Do not guess film stock/camera brand unless provided.
G) GROUNDING BLOCK (must be verbatim, at the end)
Photorealistic live-action realism.
Real materials, real lighting, real physics.
No fantasy glow, no stylized rendering, no illustrative techniques.
Everything must feel physically present, heavy, and believable.This should output a prompt somewhere around 3,000 characters, which works well in Nano Banana Pro.
Once you’ve made some selects and cropped your images, run them again in Nano Banana Pro using PJ Ace’s Upscale Prompt. (This part is optional!)
Preserve the exact composition, framing, camera angle, color grade and subject placement — do not alter or add new elements
Increase resolution to true high-end cinematic clarity, with natural film-grade sharpness (no AI oversharpening).
Apply blockbuster–style cinematography:
– grounded realism
– soft, motivated lighting
– natural contrast
– restrained highlights
– deep but clean shadows
– subtle atmospheric depth
-Keep the same color temperature and color tone
Texture pass should feel physically real: skin pores, fabric weave, dust, stone, metal, wood — all enhanced without plastic smoothing.
Maintain cinematic depth of field consistent with the original image (natural lens falloff, no artificial blur).
Do not redraw. Do not stylize. Do not beautify. Only enhance realism and resolution.And finally, you can take that “upscaled” image and run it back into your LLM to generate a Video prompt. Obviously, this part is also optional, and I highly recommend you look it over and tweak it to your liking.
I’m calling for a lot of handheld movement and camera shake. You may not like that!
And once again: User Input at the top. Remember: You’re the director!
FINAL VIDEO PROMPT (OUTPUT ONLY) Analyze the attached reference image and extend this exact moment into motion.
Create a single continuous video shot that feels raw, handheld, and unstable, as if filmed under pressure.
CAMERA (CRITICAL — PRIORITY)
The camera is handheld and shaky at all times.
Constant micro-jitter, uneven framing, slight rotational wobble, and irregular motion blur caused by real human movement.
No stabilization. No smooth pans. No dolly or crane motion.
Occasional small grip adjustments cause brief, imperfect micro-whips.
Camera shake must be visible throughout the entire shot and drive the tension.
SHOT
Close-to-medium handheld shot of the subject from the reference image, in the same environment.
The camera reacts to the subject rather than controlling them.
If dialogue is provided, the subject delivers “[INSERT DIALOGUE HERE]” with urgency and strain.
PERFORMANCE
Expression shifts subtly during the shot: jaw tightening, breath visible, eyes flicking, tension rising.
The action remains one continuous moment — no cuts, no time jump.
MOTION BLUR
Motion blur is present due to camera shake and movement, not post effects.
Blur increases slightly during emotional emphasis or camera instability.
LIGHTING & SPACE
Lighting remains consistent with the reference image.
Practical light sources only.
Minor exposure fluctuation occurs naturally due to camera movement, not lighting changes.
STYLE CONSTRAINTS
Photorealistic live-action.
No cinematic smoothness.
No slow motion.
No stylized effects.
No animation-like motion.
The shot should feel like a single raw take, captured in the middle of chaos.
THAT’S A WRAP!
So, there we go! Second Issue out the door!
If you have any feedback or anything you’d like to see, please let me know: [email protected]
Or, just drop a comment on one of the videos! I honestly do see them all!
As Always I thank you for Reading…
Tim

