Image AI Test: same prompt, different chats
- Niv Nissenson
- 6 days ago
- 2 min read

This isn’t a typical The Tester AI evaluation, and there’s no strict pass/fail verdict here. Visual art is inherently subjective but that doesn’t mean quality doesn’t matter.
While working on a recent blog post, I wanted to compare how different AI tools interpret the exact same prompt. The goal wasn’t to optimize or refine it was to observe each model’s default instincts.
For this test, I intentionally used a simple, almost naïve prompt, the kind most users would start with:
Prompt: I’ve done a post discussing at birth unicorn startups. I want an image to go along with the post. I want the image to show a pasture with the San Francisco skyline behind it. In the pasture there are 2-3 AI Robot Unicorn Horses.
Claude (Sonnet 4.6):

Claude's default take was both childish and low effort. You can't really tell that the skyline is San Francisco and the Unicorns look more like cats than horses. I asked Claude for a Photo realistic version and it told it simply can't do that and suggested I try with ChatGPT among others.

ChatGPT:

ChatGPT delivered a photorealistic interpretation with strong-looking unicorns and solid overall composition. The main drawback is the skyline, it suggests a coastal city, but without iconic elements like the Golden Gate Bridge or Transamerica Pyramid, it’s not distinctly San Francisco.
Still, this is a strong and usable result.
Grok:

Grok took a more stylized, slightly whimsical approach. The inclusion of the Transamerica Pyramid helps anchor the setting as San Francisco, although the absence of the bridge or a clearer bay limits that effect.
The unicorns feel bold and “formidable,” which adds personality but the overall tone may not fit every use case.
Gemini (Nano Banana on Fast):

Gemini delivered the most complete interpretation. The skyline clearly reads as San Francisco, with recognizable elements like the bay, bridge, and pyramid building all present.
The composition feels balanced, and the output aligns closely with what I had in mind making it my personal favorite of the group.
Takeaway
Even with a simple prompt, the differences between models are significant. Each tool brings its own assumptions, style, and limitations and those defaults matter more than most people expect.
If you’re relying on AI for visual content, the first output already tells you a lot about how much effort you’ll need to invest to get to something usable.


