TL;DR
A good GPT Image 2 prompt is not a sentence β it is a stack of decisions: subject, setting, style, camera, light, mood. This guide gives you 50+ copy-ready prompts across cinematic, portrait, action, nature and fantasy, plus a failure-mode fix list and a workflow for iterating fast. Every image here was made with the same KIE gpt-image-2-text-to-image model you use on the site, at 12 credits per image and up to 20,000 characters per prompt. Try GPT Image 2 free β
The Anatomy of a Great GPT Image 2 Prompt
Most people type what they want. The people getting good output type what the camera sees. That is the whole secret.
After running a few thousand generations across the KIE gpt-image-2-text-to-image endpoint, we settled on an eight-slot formula that fits almost every scene. Fill six of the eight and you are above average. Fill all eight and you hit the style of a commercial shoot.
The formula:
[Subject] + [Action / Pose] + [Setting] + [Style / Reference] + [Camera & Framing] + [Lighting] + [Mood / Color palette] + [Quality modifier]
Each slot answers one question the model would otherwise guess:
- Subject β who or what is in the frame. Specific nouns beat vague ones: "auburn-haired librarian" beats "woman".
- Action / pose β what they are doing right now. Verbs pin down composition.
- Setting β the world around them. Name the country, era, or time of day.
- Style / reference β "film noir", "Ufotable anime", "Wes Anderson symmetry", "Fenty Beauty campaign". Reference known visual languages, not random adjectives.
- Camera & framing β "extreme close-up", "low-angle wide shot", "85mm portrait lens, f/1.4", "anamorphic lens". This is what separates snapshots from frames.
- Lighting β "golden hour rim light", "single-source Rembrandt lighting", "neon reflections on wet cobblestone". Light is 60% of the image.
- Mood / color β "cold teal and orange contrast", "warm amber and deep shadow", "desaturated melancholic palette".
- Quality modifier β "ultra-realistic 4K", "cinematic grain", "editorial photography". Keep this short; the rest of the prompt is doing the heavy lifting.
Basic vs optimized β the same subject, both passes

The basic prompt above was, word for word:
A woman standing in a room.Now here is the same idea with the eight slots filled in:
A breathtaking young woman with flowing auburn hair stands in a luxurious Art Deco penthouse at golden hour. She wears a champagne-colored satin slip dress that catches the warm light. Floor-to-ceiling windows behind her show a panoramic city sunset. Dramatic side lighting creates deep shadows and golden highlights on her face and bare arms. The composition follows the rule of thirds. Cinematic depth of field with gorgeous city bokeh. Fashion editorial quality. Ultra-realistic 4K.
The second prompt does not have more adjectives β it has fewer decisions left to the model. The model that powers GPT Image 2 is a transformer-guided diffusion system (see Wikipedia on diffusion models for the math), and every missing detail gets filled in by priors. If you do not say "golden hour", you will get the model's average time of day, which is 2pm on an overcast Tuesday.
One more thing: GPT Image 2 accepts prompts up to 20,000 characters. That is around 3,000 words. You will never need that much for a normal shot β but for complex multi-character scenes or detailed concept art, the ceiling is there. We show a long-prompt pattern in section 11.
Prompt Library: Cinematic Scenes
Cinematic prompts are the easiest to nail because film has a hundred years of established visual vocabulary. Name the genre, name the decade, name the camera, and the model will do the rest.

1. Neo-noir Hong Kong alley
Film noir cinematic shot. A dangerously beautiful femme fatale in a curve-hugging red silk dress with a thigh-high slit, walking through a rain-soaked Hong Kong back alley at night. Neon signs in Chinese characters reflect red and blue on the wet cobblestones. She carries a black umbrella over one shoulder, her red-painted lips the only warm color against the cold teal lighting. Smoke wisps from a nearby vent. Anamorphic lens, shallow depth of field, cinematic grain. Ultra-realistic 4K noir film frame.Why it works: the color story (red against teal) is explicit, the camera is named (anamorphic), and the era (film noir) gives the model a stylistic anchor.
2. Jazz bar Rembrandt lighting
Moody jazz bar interior. A mysterious woman in a sheer black lace dress sits on a velvet barstool, one leg crossed showing stiletto heels. Cigarette smoke curls around her silhouette. Warm amber spotlight from above illuminates her face and exposed collarbones while the rest fades into deep shadow. A saxophone player is a blurred silhouette in the background. Film noir meets modern luxury aesthetic. Dramatic Rembrandt lighting, 35mm film look. Ultra-realistic 4K.3. Blade Runner rooftop
Cyberpunk cinematic wide shot. A lone detective in a wet black trench coat stands on a neon-drenched Tokyo rooftop at 3am. Giant holographic advertisements of a geisha float across the skyline behind him, casting shifting pink and cyan light on his face. Light rain catches the glow. Flying cars streak past as horizontal light trails. Shot on anamorphic lens, 2.39:1 aspect, shallow depth of field. Blade Runner 2049 color grade β teal shadows, orange highlights. Ultra-realistic 4K cinematic frame.4. Wes Anderson symmetrical hotel lobby
Wes Anderson style cinematic composition. A 1960s hotel concierge in a burgundy uniform stands dead-center in a pastel-pink Art Deco lobby, flanked by perfectly symmetrical potted palms and brass sconces. Flat front-on framing, everything on center axis. Soft fluorescent overhead lighting. Pastel pink and mint green color palette. 35mm film look. Ultra-detailed 4K.5. Korean thriller kitchen standoff
Cinematic still from a modern Korean crime thriller. Two men face each other across a small Seoul apartment kitchen at 2am, both holding knives but frozen in a tense moment. Single fluorescent tube overhead casts hard green-tinted light and harsh shadows. Steam rises from an abandoned pot on the stove. Tight composition, 40mm lens, handheld feel. Bong Joon-ho style. Ultra-realistic 4K.
6. Villeneuve desert epic
Epic cinematic wide shot in Denis Villeneuve style. A lone hooded figure in flowing desert robes walks across a vast orange sand dune at sunset. The sun is enormous on the horizon, casting elongated shadows. Scale is extreme β the figure is tiny, the landscape overwhelming. Dust kicks up in the wind. Warm amber palette with deep violet shadows. Shot on 65mm, ultra-wide aspect. Ultra-realistic 4K cinematic quality.7. French New Wave cafe
Black and white French New Wave cinematic still. A young woman in a striped Breton shirt and dark bob haircut smokes at a Paris cafe table in 1962. She looks off-camera with soft intensity. Natural window light, high contrast, slightly overexposed highlights. Film grain visible. Godard aesthetic. 35mm monochrome, 50mm lens. Ultra-detailed.8. Found-footage horror hallway
Cinematic horror frame in the style of a 1970s Italian giallo. A woman in a white nightgown stands at the end of a long Victorian hallway lit only by flickering red lamplight. Her back is turned. Shadow stretches toward the camera. Wallpaper is blood-red damask. Shallow depth of field, 28mm lens slightly distorted. Grainy film look. Deep red and black color story. Ultra-detailed 4K.9. Miami Vice neon drive
1980s Miami Vice cinematic shot. A woman in a white linen blazer drives a red convertible at night through downtown Miami. Palm trees and neon motel signs blur past. She looks at the camera with sunglasses reflecting the pink and turquoise glow of the city. Lens flare, soft film grain. Teal and magenta color grade. Ultra-realistic 4K.10. Studio Ghibli live-action
Cinematic still styled as a live-action Studio Ghibli adaptation. A young woman in a simple blue linen dress stands in a vast green hillside field, wind blowing her hair and skirt. Fluffy white clouds race overhead. Soft golden hour light. Warm, painterly color grading with gentle film grain. Wide lens, low-angle composition making her heroic against the sky. Ultra-detailed 4K.Prompt Library: Portraits & Beauty
Portrait prompts live or die on three specifics: lens, light direction, and skin texture. Say "85mm at f/1.4" or "ring light" or "diffused softbox from camera left" and you skip three rounds of iteration.

11. Fenty Beauty macro
Extreme close-up beauty portrait. A stunning model with wet dewy skin and tousled damp hair, bare shoulders glistening. Water droplets on her face and neck catch the light of a ring light. Flawless skin texture in macro detail β every pore, every water droplet razor sharp. Smoky eye makeup with subtle gold shimmer. Lips slightly parted, intense gaze at camera. Dark background. Fenty Beauty campaign aesthetic. 85mm macro lens, f/1.4, ultra-shallow depth of field. Ultra-realistic 4K.12. Baroque editorial reclining
Luxury editorial portrait. A gorgeous model wearing an elegant black velvet off-shoulder gown reclines on a dark velvet chaise longue in a dimly lit Baroque-style room. One arm draped elegantly above her head. Rich warm Rembrandt lighting from a single window highlights the fabric draping against her glowing skin. Oil painting-like quality with deep shadows and warm highlights. High-end fashion editorial photography. 85mm lens, creamy bokeh. Ultra-realistic 4K.13. Clean studio headshot
Professional corporate headshot. A confident woman in her early 30s wearing a tailored navy blazer over a crisp white shirt. Neutral gray seamless studio background. Three-point lighting β soft key from camera left, subtle fill from right, rim light from behind. Genuine warm smile, direct eye contact. 85mm lens, f/2.8. Skin tone natural and healthy. LinkedIn executive headshot quality. Ultra-realistic 4K.14. Street portrait Tokyo
Environmental street portrait. A 20-something Tokyo local with bleached blonde hair and oversized vintage streetwear stands in Shibuya on a weekday afternoon. Shallow depth of field with crowd of pedestrians soft-blurred behind her. Natural overcast daylight. She looks slightly off-camera, lost in thought. Shot on Fujifilm X100 aesthetic, 35mm lens, f/2. Ultra-realistic 4K.15. High fashion Vogue cover
High-end fashion portrait in the style of a Vogue Italia cover. A striking model with razor-sharp cheekbones wears an oversized metallic silver couture gown with architectural shoulders. She stares directly into camera with a cold, commanding expression. Hair pulled back tight. Studio lighting is a single hard light from 45 degrees creating sculptural shadows. Gray backdrop. 85mm portrait lens, f/5.6 for crisp detail. Ultra-detailed 4K.16. Natural light window portrait
Soft natural light portrait. A woman with wavy chestnut hair sits by a large north-facing window in a quiet morning kitchen. She holds a ceramic mug of coffee in both hands, looking out the window thoughtfully. Warm cream sweater, no makeup, freckles visible. Shot in Rembrandt light with window as the only source. 50mm lens, f/1.8, shallow depth of field. Soft, honest, lived-in feel. Ultra-realistic 4K.17. Moody monochrome
Dramatic black and white portrait. A man with a short salt-and-pepper beard and intense dark eyes stares into the lens. Only half his face is lit β hard side light from camera right, pure black shadow on the other side. Textured gray background fades to black. Shot on medium format film aesthetic, 80mm lens. Film grain. Peter Lindbergh style monochrome. Ultra-detailed.18. Editorial beauty pastel
Dreamy pastel beauty portrait. A model with soft pink lips, dewy skin, and flushed cheeks against a blush pink seamless backdrop. She wears a sheer white off-shoulder top. Soft diffused lighting from a large softbox creates flattering even illumination. Hair in loose tousled waves. 85mm lens, f/2. Cotton candy color palette β pink, peach, cream. Ultra-realistic 4K beauty editorial.19. Golden hour romantic
Sun-drenched golden hour portrait. A woman in a flowing cream linen dress stands in a wheat field at 7pm on a summer evening. The sun is low behind her, creating a halo of golden backlight through her hair and the sheer fabric. Lens flare across the frame. Her eyes are closed, face tilted up to the warmth. 135mm telephoto lens, f/2, compressed background. Warm honey color grade. Ultra-realistic 4K.20. Dark academia library
Dark academia editorial portrait. A young woman with auburn hair in a loose braid wears a wool cardigan over a white collared shirt in an old university library. She holds an open leather-bound book, reading by the light of a green banker's lamp. Towering bookshelves around her fade into shadow. Warm tungsten light, deep navy and olive color palette. 50mm lens, f/2.8. Ultra-realistic 4K.Prompt Library: Action & Dynamic Motion
Action prompts need frozen motion cues ("frozen mid-air", "high-speed capture", "motion blur onβ¦") and rim light to separate subject from chaotic backgrounds.

21. Nike Training freeze-frame
Dynamic action freeze-frame. An athletic woman in a fitted sports bra and high-waisted compression shorts executes a powerful spinning roundhouse kick. Water splashes frozen in mid-air around her legs and feet in a dramatic spray pattern. Her toned abs and defined muscles visible. Dramatic single-source rim lighting from behind creates a glowing silhouette edge. Dark studio background. Nike Training campaign energy. High-speed photography feel β ultra-sharp subject, motion blur on water droplets. Ultra-realistic 4K.22. Big-wave surfer barrel
Epic wide-angle shot of a female surfer riding inside a massive crystal-clear barrel wave at golden hour. Her silhouette and athletic body visible through the translucent turquoise water of the wave tube. Golden sunlight creates an explosion of light and water mist behind her. Dramatic backlit composition. The wave is enormous and perfectly formed. GoPro-style immersive perspective. Ultra-realistic 4K cinematic quality.23. Parkour rooftop leap
High-speed action shot of a parkour athlete mid-leap between two Brooklyn rooftops at sunset. Frozen at the apex of the jump, arms and legs extended, silhouetted against a burning orange sky. The gap below him is dizzying β city streets far below. Motion blur on the trailing edge of his hoodie. Shot from a drone at his height, 35mm lens. Ultra-realistic 4K cinematic action.24. MMA ring spotlight
Dramatic fight night action. A female MMA fighter mid-spinning back elbow, sweat flying from her hair in a visible arc of droplets. Single harsh overhead ring spotlight isolates her from pure black background β classic boxing photography look. Her opponent is a blurred silhouette out of focus. 70-200mm lens at 200mm, f/2.8, 1/2000 shutter frozen motion. High contrast, desaturated. Ultra-detailed 4K.25. Motocross dust storm
Low-angle action shot of a motocross rider airborne over a dirt jump, red desert dust exploding behind the rear tire. Late afternoon sun casts long shadows. The bike is tilted aggressively mid-trick. Camera is just above ground level looking up, making the jump look monumental. Anamorphic lens flare from the sun. Orange and teal color grade. Ultra-realistic 4K action.26. Ballet jump studio
Contemporary ballet dancer mid-grand jete frozen in the air, arms extended, body perfectly horizontal. She wears a simple nude leotard. Plain gray cyclorama studio background. Strong side-light from camera left creates a sculptural chiaroscuro on her musculature. Powder disturbed from the floor traces her leap in a soft cloud. 1/4000 shutter speed feel. Ultra-detailed 4K.27. Basketball slam dunk
Low-angle hero shot of a male basketball player mid-slam dunk, one hand gripping the rim, body extended diagonally across the frame. Arena lights streak as lens flares. Crowd is a soft blurred wall of phone flashes behind him. Frozen sweat and net motion. Shot on 24mm wide from directly below the hoop. NBA official photography energy. Ultra-realistic 4K.28. Horse gallop splash
A rider on a powerful black horse gallops through knee-deep shallow ocean water at sunrise. Water explodes from each hoofstrike, frozen in a dramatic spray. The rider is leaned low, hair streaming behind. Warm golden backlight from the rising sun. Mist rising off the water. Shot at 1/4000 shutter, 200mm telephoto compression. Ultra-realistic 4K equine photography.Prompt Library: Nature & Landscapes
For landscapes, the power words are time of day, atmospheric condition, and vertical scale. The model has a strong prior for generic "pretty nature" β you have to push it past that.

29. Waterfall mist ethereal
Ethereal fantasy nature scene. A graceful young woman in a flowing sheer gossamer dress stands at the edge of a towering waterfall cliff. Dense tropical mist swirls around her legs and the translucent fabric. She extends one arm toward the cascade, water droplets catching golden light. Aerial perspective slightly from above showing the dramatic cliff drop. Lush green ferns frame the composition. Golden hour light filtering through the mist. Ultra-realistic 4K cinematic quality.30. Maldives aerial float
Overhead drone shot of a beautiful woman in a minimal white bikini floating on her back in crystal-clear turquoise shallow water over white sand in the Maldives. Her long dark hair fans out in the water like a halo. The water is so clear her full body is visible through the translucent surface. Tiny fish swim nearby. Travel photography editorial style. Ultra-realistic 4K aerial quality.31. Iceland black sand coast
Dramatic wide landscape of Iceland's Reynisfjara black sand beach at dawn. Massive basalt sea stacks rise from the churning North Atlantic. Low fog drifts across the black sand. A single figure in a red rain jacket walks along the shoreline for scale. Moody desaturated color grade β almost monochrome with just the red jacket as accent. 24mm wide lens, f/11 for deep focus. Ultra-detailed 4K.32. Redwood cathedral light
Vertical composition looking up through towering California redwood trees. Shafts of golden morning sunlight cut through the fog between the trunks like cathedral light rays. Ferns carpet the forest floor. A tiny hiker in the distance gives scale. Ultra-wide 14mm lens distorting the trunks into a radial pattern toward the sky. Warm green and gold palette. Ultra-realistic 4K nature photography.33. Patagonia mountain reflection
Perfect mirror reflection of the jagged Torres del Paine peaks in a glass-still Patagonian alpine lake at blue hour. Pink and purple alpenglow on the snow-capped summits. A single orange tent on the near shore as human scale. Complete symmetry β upper and lower half of frame are near-mirror images. 35mm lens, f/11. Ultra-realistic 4K landscape.34. Sahara dune storm
Vast Sahara desert at the start of a sandstorm. Rolling orange dunes extend to the horizon, with a towering wall of sand approaching from the left. A lone nomadic figure on camelback is silhouetted against the dust cloud. Sun struggles through the haze as a dim orange disc. Cinematic wide-angle, heavy atmospheric haze. Monochromatic warm orange palette. Ultra-detailed 4K.35. Northern lights cabin
Wide landscape of a tiny warm-lit wooden cabin in a Norwegian fjord valley at 1am. A spectacular green and purple aurora borealis dances overhead, reflecting in the still black fjord water. Snow-dusted pine trees and mountains frame the scene. The cabin glow is the only warm color in an otherwise cold composition. 20-second long exposure feel. Ultra-realistic 4K astrophotography.36. African savanna sunset
Cinematic wide shot of a family of elephants crossing a golden savanna at sunset in Kenya. The sun is a huge orange disc on the horizon, silhouetting the herd. Long grass ripples in the warm wind. Dust kicked up by the herd diffuses the backlight into warm beams. 200mm telephoto compression. National Geographic editorial style. Ultra-realistic 4K wildlife photography.37. Cherry blossom river Kyoto
Serene wide landscape of the Philosopher's Path in Kyoto at peak cherry blossom season. Pink petals float on the narrow canal, with more drifting down from the trees above. Traditional wooden bridges arch over the water. Early morning mist softens the light into diffused pink. A solo figure in a dark kimono walks along the stone path for scale. 50mm lens, f/4, gentle pastel color grade. Ultra-realistic 4K.38. Storm-lit Scottish highlands
Dramatic landscape of the Scottish Highlands during a clearing thunderstorm. Dark churning clouds above a lone glen, with a single shaft of golden sunlight breaking through and lighting one patch of heather-covered hillside. Rainbow arc barely visible at the edge. Ancient standing stones in the foreground. Moody cinematic color grade β steel blue shadows, warm sunlit highlight. 24mm wide, f/11. Ultra-realistic 4K landscape photography.Prompt Library: Fantasy & Stylized
Fantasy prompts get sharper the moment you name a specific art reference (Ufotable, Arcane, Studio Trigger, Magic: The Gathering illustration). Vague "fantasy art" gets you vague fantasy art.

39. Ufotable anime warrior
Epic anime-inspired fantasy warrior princess with flowing silver-white hair that reaches her waist, wearing ornate golden battle armor that hugs her figure with intricate engravings. She holds a glowing magical sword aloft, emitting bright blue energy. Cherry blossom petals and magical sparkles swirl in a violent storm around her. Her expression is fierce and determined. Dynamic action pose mid-battle leap. Ultra-detailed anime with CGI-quality lighting β Ufotable production quality. Rich colors, dramatic volumetric lighting. 4K quality.40. Dark elf sorceress
Dark fantasy dark elf sorceress with long flowing midnight-purple hair, pointed ears, and luminous violet eyes. She wears an elegant off-shoulder dark robe with intricate silver embroidery that reveals her collarbones and shoulders. Purple arcane energy spirals from her outstretched hands, illuminating her face from below. A vast star field and nebula visible in the background through a shattered stone archway. Semi-realistic fantasy illustration style with cinematic lighting. Ultra-detailed 4K.41. Ghibli forest spirit
Studio Ghibli style painterly scene. A small forest spirit that looks like a glowing white fox with three tails walks through a mossy enchanted forest at dusk. Fireflies dance around it. Soft painterly brushstrokes, warm honey-gold light filtering through massive ancient trees. Hayao Miyazaki watercolor aesthetic. Ultra-detailed animation cel quality.42. Arcane League style
Arcane Netflix animated series style illustration. A young woman with blue-tipped braided hair and steampunk goggles leans against a graffitied alley wall in the undercity of Piltover. Neon magical rune-signs glow behind her. Textured painterly brushstrokes visible, 2D illustration with 3D depth, saturated purple and teal color story. Fortiche animation studio aesthetic. Ultra-detailed 4K.43. Magic the Gathering dragon
Fantasy illustration in the style of a Magic The Gathering card. A colossal red dragon emerges from molten lava in an underground cavern, wings half-spread, mouth roaring with fire breath forming. A tiny knight in silver armor stands at the cavern's edge for scale, raising a shield. Dramatic low-angle hero composition. Rich oil-painting texture, Greg Rutkowski influence. Ultra-detailed 4K fantasy art.44. Cyberpunk samurai
Cyberpunk fantasy fusion. A female samurai with a chrome katana stands on the rain-slicked rooftop of a neo-Tokyo megacorp tower at night. She wears a fusion of traditional kimono and carbon-fiber combat armor. Holographic cherry blossoms drift around her. Neon reflections on the wet rooftop, flying ad-drones in the background. Illustrated in the style of Katsuhiro Otomo meets modern 3D concept art. Ultra-detailed 4K.45. Underwater mermaid
Ethereal underwater fantasy. A graceful mermaid with iridescent teal and violet scales swims through a coral reef illuminated by shafts of sunlight piercing the water surface above. Her long turquoise hair flows weightlessly. Bubbles trail from her fingertips. School of small silver fish swim past. Dreamlike painterly quality, Lisa Frank meets National Geographic. Ultra-detailed 4K fantasy art.46. Steampunk airship captain
Illustrated steampunk fantasy portrait. A young female airship captain in a brass-buttoned red military coat, goggles pushed up on her forehead, stands at the wheel of a wooden airship. Visible brass gears and copper pipes. Behind her, clouds and other distant airships. Warm golden hour lighting. Illustration style inspired by Nausicaa and Howl's Moving Castle. Ultra-detailed 4K.Multi-Style Iteration: Same Subject, Different Worlds
One underrated GPT Image 2 workflow: lock the subject and only change the style slot. You learn what each style does to the same face, outfit, and pose β which teaches you which style to reach for in the future.

Here is the base prompt β the subject stays identical across all four renders:
A beautiful young woman with shoulder-length brown hair stands in a sunlit garden, wearing a simple white sundress, one hand lightly touching a rose bush. Soft golden afternoon light. Three-quarter body framing, slightly tilted head, warm smile.Now cycle the style slot, one run per variant:
47. Photoreal variant
[Base] β Hyperreal fashion photography aesthetic. 85mm lens at f/1.8, soft natural light, editorial sharpness. Ultra-realistic 4K.48. Anime variant
[Base] β Japanese anime style with cel shading, bold line art, vibrant saturated colors, large expressive eyes. Kyoto Animation production quality. Ultra-detailed.49. Oil painting variant
[Base] β Classical oil painting style with visible thick brushstrokes, warm Renaissance lighting, chiaroscuro shadow, Vermeer-like color palette. Museum-quality.50. Cyberpunk variant
[Base] β Neon-drenched cyberpunk futurism. Holographic overlays, circuit-pattern light tattoos on skin, magenta and cyan rim lighting. Ghost in the Shell art direction. Ultra-detailed.When I ran this sequence on our test account, the first render took about 18 seconds, each style swap about the same. Total: under two minutes and 48 credits for a full mood board. For a client deck that would normally be a half-day of stock hunting, that is the unlock.
Common Prompt Failures and How to Fix Them
Honest section: GPT Image 2 is very good, but it is not magic. Here are the failures we have logged most often and the prompt patterns that fix them.
Failure 1: Generic, lifeless output
Before:
A beautiful woman in a city.After:
A 28-year-old woman with auburn hair pulled into a low ponytail, wearing a camel trench coat, crossing a Manhattan crosswalk at 6pm on a rainy Thursday. Yellow taxis blur past in motion-blurred streaks. 50mm lens, f/2, cinematic grain. Ultra-realistic 4K.The first prompt gives the model nothing to ground on. The fix is always concrete nouns and named places.
Failure 2: Wrong number of fingers
GPT Image 2 is much better than first-generation diffusion models on hands, but macro close-ups of hands can still go sideways. Two reliable mitigations:
- Don't ask for fingers to be the subject. Crop them out: "framing is shoulders up only".
- Give the hand something to hold: "hands gently holding a ceramic coffee cup" fixes count because the object constrains pose.
Failure 3: "Text in image" garbled
The model is not a reliable typesetter. For logos, signs, posters with readable text β either keep text extremely short ("a sign reads OPEN") or add to the prompt: "no text, no letters, no words anywhere in the image" and add the typography later in Figma or Photoshop.
Failure 4: Lighting direction ignored
Before:
A portrait of a woman with dramatic lighting.After:
A portrait of a woman lit by a single hard spotlight from 45 degrees camera-left, with deep black shadow filling the right side of her face. Rembrandt lighting with a small triangle of light on the shadowed cheek."Dramatic lighting" means nothing. Naming the direction, the hardness, and the shadow coverage means everything.
Failure 5: Subject appears in wrong setting
If the model keeps putting your character in a generic studio when you asked for a library, move the setting to the front of the prompt and be explicit:
In a candle-lit 17th-century English library with floor-to-ceiling oak shelves, leather-bound books, and a stone fireplace, a woman inβ¦Starting with the setting frames the entire composition before the subject is even introduced.
Failure 6: Overcrowded prompt
Past about 1,200 words, individual adjectives start to dilute. If the prompt is a laundry list of 40 stylistic tags, the model averages them. Keep one dominant style anchor (e.g., "film noir") and treat everything else as support.
Working at the 20K Character Limit: Advanced Structured Prompts
One of the quiet advantages of GPT Image 2 is the 20,000-character prompt ceiling. Most competitors cap around 1,000β2,000 characters. You will not need the full budget for a portrait β but for complex multi-character scenes, concept art briefs, or brand-consistent campaign imagery, a structured long prompt earns its keep.
The pattern we use on production briefs:
# SCENE
[Setting: location, time, weather, historical period in 2-3 sentences]
# CHARACTERS
- Character A: [full physical description, wardrobe, current pose, facial expression]
- Character B: [same]
- Background extras: [brief description]
# COMPOSITION
[Framing: wide / medium / close. Camera angle. Lens. Depth of field. Where each character sits in the frame (rule of thirds, golden ratio, center).]
# LIGHTING
[Source, direction, hardness, color temperature, shadow behavior]
# COLOR
[Palette in 3-4 color terms. Grading direction (warm / cool / split-tone).]
# STYLE
[One dominant reference. E.g., "Roger Deakins cinematography in Blade Runner 2049."]
# TECHNICAL
[Resolution modifier, film grain, aspect ratio, quality markers. Keep short.]
# EXCLUSIONS
[What to avoid: "No text, no logos, no watermarks, no extra limbs."]Example β fully structured prompt (~500 words) for a campaign hero image:
# SCENE
A restored 1930s Art Deco ballroom on a rainy Tuesday evening in Paris, set during a private jazz performance. Tall arched windows on the left show wet boulevards and soft yellow streetlamp glow. Interior is lit warm and amber.
# CHARACTERS
- Lead: A striking 32-year-old woman with dark auburn hair in a low chignon, wearing a deep emerald-green silk bias-cut gown with a low back. She stands near a grand piano, one hand resting on its polished black lid, gazing thoughtfully toward the windows. Faint melancholy in her expression.
- Pianist: A middle-aged man in a black tuxedo, seated at the piano mid-performance, profile view, fingers on keys. He is a secondary figure β should not pull focus from the lead.
- Background: Three or four well-dressed patrons at candlelit round tables in soft bokeh, unidentifiable faces.
# COMPOSITION
Medium-wide shot. Lead character is on the right third of the frame, piano extending diagonally across the center toward the left. Rule of thirds. 50mm lens, f/2.2, shallow depth of field β lead and piano sharp, background patrons and windows softly blurred. Eye-level camera height.
# LIGHTING
Warm tungsten chandelier overhead providing ambient glow on the room. Key light on the lead is a single practical wall sconce camera-right at 45 degrees, modeling her face in gentle Rembrandt pattern. Rim from the windows behind her (cool blue rainy light) separates her hair and shoulder edge from the warm interior. Overall contrast: high but soft.
# COLOR
Deep emerald green (dress) and warm amber (interior) as hero colors, with cool blue window light as counter-accent. Warm gold dominant, with selective teal shadow detail. Film-look color grade reminiscent of early Wong Kar-wai.
# STYLE
Cinematic still in the visual language of In the Mood for Love meets a modern luxury cognac commercial. Anamorphic lens quality (slight horizontal flare on the candles). Painterly softness, 35mm film grain.
# TECHNICAL
Ultra-realistic 4K, 16:9 aspect, cinematic frame.
# EXCLUSIONS
No text, no signage, no logos, no watermarks, no visible phones or modern electronics, no extra limbs, no warped fingers on the pianist.The sectioned format does two things: it keeps you from forgetting a slot, and it gives the model a structural parse rather than a 500-word run-on. You can re-use this template across a whole campaign by only editing the CHARACTERS and SCENE sections.
One working tip: when a render gets 80% of the way but one element is wrong β say, the lead is wearing the wrong color β don't rewrite the whole prompt. Copy the successful prompt, edit just that slot, and regenerate. Our internal iteration log shows 2.8 regenerations on average to land a hero-quality frame, down from 6+ when we used unstructured prompts. At 12 credits per image, that is the difference between a $2 and a $5 asset.
If you want to hand off the structured prompt workflow to a teammate, point them at the how-to article first and then to this guide.
Frequently Asked Questions
What is the most important part of a GPT Image 2 prompt?
Lighting and camera β in that order. You can get away with a vague subject and a vague setting if you nail the light direction and the lens choice. The opposite is not true: a perfectly described subject in "normal lighting" will look like a stock photo every time. If you only have time to refine two slots in your GPT Image 2 prompt, refine those two.
How long should a GPT Image 2 prompt be?
For portraits and simple scenes, 80β150 words is the sweet spot. For cinematic wide shots with specific era and style anchors, 150β250 words. For multi-character scenes and campaign briefs, use the structured 400β800 word pattern above. The 20,000-character ceiling is there for extreme cases; in daily use you will rarely exceed 500 words.
Can I use real artist names in my prompt?
You can reference a visual style or era β "film noir", "1970s giallo", "Studio Ghibli painterly" β and the model will pick up the language. Using a living artist's name as a direct style tag is a gray area ethically and increasingly gated by model-side filters. Prefer naming the style, the medium, and the era over naming a specific person.
Why do my prompts give different results each time?
Diffusion models are stochastic by design β they start from random noise and denoise into the final image. Run the same prompt twice and you get two closely related but distinct results. This is a feature, not a bug: it is how you get variety. If you want reproducibility, most generation systems support a seed parameter. For more background, see OpenAI's image generation blog on how prompt-to-image inference works.
Does prompt length affect the cost?
No. GPT Image 2 uses flat pricing at 12 credits per image, regardless of whether your prompt is 20 words or 2,000 words. The only thing that scales cost is the number of images you generate.
How many prompts should I try before giving up on a concept?
Our rule of thumb: three iterations of the same prompt to see its natural variance, then if you're still off, change one slot. Don't rewrite everything. Most of the time, the fix is just specifying the lighting or swapping the camera angle. If you're at eight regenerations with no progress, the problem is structural β go back to the eight-slot formula and check which slots you actually filled.
Can I use GPT Image 2 prompts commercially?
Yes. Images you generate through GPT Image 2 are yours to use commercially under the product's standard terms. Check the site footer for the current license language and consult a lawyer for anything high-stakes.
What's the difference between text-to-image and image-to-image prompting?
Text-to-image starts from noise; the prompt is your only guidance. Image-to-image starts from an uploaded reference and the prompt modifies it. Prompts for image-to-image should be shorter and focus on the change you want ("convert to oil painting style, keep subject's pose and outfit identical"), not the full scene description β the reference image already provides most of the slots.
Ready to Start?
You have 50+ prompts. You have the eight-slot formula. The next step is actually opening the tool and running one. Pick any prompt above, paste it, and see how close the output lands to what's in your head β then fix the one slot that was off.
Start generating with GPT Image 2 free β
Keep reading:
- What Is GPT Image 2? Full Overview and First Look
- How to Use GPT Image 2: Step-by-Step Tutorial
- GPT Image 2 vs Sora: Honest Comparison
- GPT Image 2 vs Kling: Which One Should You Pick?
Questions about a specific prompt pattern? Ping us on the site β we read every message, and the prompts that get asked about most often tend to end up in the next version of this guide. For the theoretical side, Wikipedia's entry on text-to-image models is a solid 10-minute read.

