O1-Image0.90
GPT Image 20.89
Qwen-Image0.87
FLUX.20.87
GenEvalCompositional generation
Reported in the technical report (arXiv:2605.11061). Bars show the open HiDream-O1-Image (8B); the scaled-up Pro (200B+) tops every benchmark in the report. Full tables on arXiv.








01Unified inputs. Raw pixels, a text prompt (refined by the Reasoning-Driven Prompt Agent), and task conditions all enter as one stream — no separate VAE, no frozen text encoder.
# pip install -U diffusers transformers accelerate
from diffusers import HiDreamO1ImagePipeline
import torch
pipe = HiDreamO1ImagePipeline.from_pretrained(
"HiDream-ai/HiDream-O1-Image",
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
prompt="a fox reading by candlelight, a sign reads 'HiDream'",
height=2048, width=2048,
num_inference_steps=50, guidance_scale=5.0,
).images[0]
image.save("out.png")