top of page

Can AI do this? We test it.



Can AI gather financial information?
Can AI gather financial data reliably? I tested Gemini, ChatGPT, and Claude on pulling key metrics for major tech companies. All delivered usable results with solid tables—but with some caveats around reporting periods and market cap accuracy. Verdict: AI works well here, but still needs human oversight.
Apr 27


Can AI write a press release?
Press releases are structured, predictable, and language-heavy—exactly where AI excels. Testing Gemini and Claude confirmed it: both produced high-quality drafts with minimal effort required. Human review still matters, but AI gets you most of the way there instantly.
Apr 21


Can AI do my Geometry homework?
I tested Gemini, ChatGPT, and Claude on simple 5th-grade geometry worksheets. Claude delivered perfect, consistent results across both tests, while Gemini and ChatGPT stumbled on triangles. ChatGPT recovered after a retry, but Gemini doubled down on errors. The takeaway: AI can solve geometry, but reliability is still a real issue.
Apr 16


Can AI Replace a Controller?
Can AI replace a controller?
I tested ChatGPT on a simple parent-subsidiary scenario. It looked confident—but failed key steps, double-counted equity, and broke the balance sheet. It eventually fixed itself after multiple prompts. Verdict: AI still can’t replace a controller.
Apr 9


Can AI do my math homework?
Can AI handle basic math? I tested ChatGPT, Gemini, and Claude on 5th-grade word problems. All three delivered perfect results—accurate answers, clear explanations, and zero errors. A simple test.
Results can still vary
Apr 7


Image AI Test: same prompt, different chats
Simple AI image test: ChatGPT, Claude, Grok, and Gemini all given the exact same prompt with no optimization. The differences were striking from cartoonish interpretations to near-photorealistic scenes revealing each model’s instincts, strengths, and blind spots right out of the box.
Apr 5


How We Score AI
The Tester AI scoring explained: At The Tester AI, every test is built around a simple principle: Can AI actually do the job—not just in theory, but in practice? Each test is evaluated across five core categories, scored on a scale of 1 to 5:
Output Delivered – Did the AI complete the task?
Accuracy – How correct was the result?
Quality – Is the output usable in a real-world setting?
Ease of Use – How much effort, prompting, or iteration was required?
Reliability – Was the be
Apr 4


Can AI replace an accountant?
Can Chat GPT, Gemini or Claude replace accountants? Can AI turn a simple trial balance into a P&L and Balance Sheet?
Verdict: Chat GPT showed a low effort, Gemini didn't even try and Claude tried hard but failed harder.
Apr 4


Can AI create a logo kit for my site?
Test: Can Gemini AI create a logo and full logo kit from an existing style? It generated a solid concept but failed on execution, no true transparent PNGs, inconsistent outputs, and repeated errors. ChatGPT partially fixed it but wasn’t reliable. Final score: 3/5.
Apr 3
bottom of page