HiDream O1 Image 1.5

Natively unified. The latest flagship.

Benchmarked across the board, at a fraction of the size.

O1-Image0.90
GPT Image 20.89
Qwen-Image0.87
FLUX.20.87
GenEvalCompositional generation
O1-Image89.83
Seedream88.63
Qwen-Image88.32
FLUX.287.57
DPG-BenchDense prompt alignment
O1-Image10.37
GPT Image 210.21
Nano Banana10.01
Qwen-Image9.94
HPSv3Human preference
O1-Image0.91
Seedream0.90
GPT Image 20.90
FLUX.20.89
CVTG-2KVisual text rendering
Nano Banana0.98
O1-Image0.98
FLUX.20.96
Qwen-Image0.94
LongText-BenchLong in-image text · EN
GPT Image 27.67
O1-Image7.60
Seedream7.53
Qwen-Edit7.41
GEditInstruction editing

Reported in the technical report (arXiv:2605.11061). Bars show the open HiDream-O1-Image (8B); the scaled-up Pro (200B+) tops every benchmark in the report. Full tables on arXiv.

Made with HiDream-O1-Image.

A freediver glides through deep blue water under rays of sunlight.
CinematicA freediver glides through deep blue water under rays of sunlight.
Warm still life with candles, jewelry, perfume, and natural materials.
ProductWarm still life with candles, jewelry, perfume, and natural materials.
A playful e-commerce hero for hand-painted ceramic mood cups.
CommerceA playful e-commerce hero for hand-painted ceramic mood cups.
Epic boss-fight gameplay frame with HUD, minimap, and action prompts.
Game UIEpic boss-fight gameplay frame with HUD, minimap, and action prompts.
Manga-style fashion portrait with sharp linework and reflective glasses.
IllustrationManga-style fashion portrait with sharp linework and reflective glasses.
Editorial couture portrait with ornate Chinese patternwork and gold accents.
FashionEditorial couture portrait with ornate Chinese patternwork and gold accents.
Close-up portrait of a ginger cat and fluffy white dog.
PhotorealClose-up portrait of a ginger cat and fluffy white dog.
A flamenco dancer in a red dress performs inside a circle of fire.
MotionA flamenco dancer in a red dress performs inside a circle of fire.

A single shared token space. The architecture is the contribution.

InputsText promptlanguage tokensRaw pixelsimage patchesTask conditionsedit / subjectShared token spaceinterleaved pixels + language + task tokenstextpixelstaskUnifiedTransformerhybrid attentionImage2048²

01Unified inputs. Raw pixels, a text prompt (refined by the Reasoning-Driven Prompt Agent), and task conditions all enter as one stream — no separate VAE, no frozen text encoder.

Run it where you already work.

# pip install -U diffusers transformers accelerate
from diffusers import HiDreamO1ImagePipeline
import torch

pipe = HiDreamO1ImagePipeline.from_pretrained(
    "HiDream-ai/HiDream-O1-Image",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="a fox reading by candlelight, a sign reads 'HiDream'",
    height=2048, width=2048,
    num_inference_steps=50, guidance_scale=5.0,
).images[0]
image.save("out.png")

Available across the ecosystem