LifePrompt ran GPT 5.2 Thinking,
Google Gemini 3 Pro Preview, and Claude Opus 4.5 through the 2026 entrance exams for the University of Tokyo and Kyoto University. All three AI models
beat every human applicant. Two years ago, GPT-4 failed to reach the minimum passing score at the same university.
That's how fast things are moving.
Key Points
- ChatGPT 5.2 Thinking scored 503/550 on the University of Tokyo Natural Sciences exam — beating the top human score of 453 by 50 points and achieving a perfect score in mathematics
- Google Gemini 3 Pro Preview and Claude Opus 4.5 also exceeded passing thresholds — all three AI models outperformed human top scorers across both universities
- At Kyoto University, ChatGPT scored 771 in the Faculty of Law (top human: 734) and 1,176 in Faculty of Medicine (top human: 1,098)
- AI's weakest area was essay-style questions — scoring just 25% on World History essays while hitting 90% in English
- In January 2026, ChatGPT already scored 97% across 15 unified entrance exam subjects with nine perfect scores — the Tokyo and Kyoto results represent a further step up
Two Years From Failure to First Place
The 2024 result is essential context. LifePrompt tested GPT-4 on the same University of Tokyo entrance exam in 2024 and it failed — not just narrowly, but below the minimum passing threshold. By January 2026, ChatGPT was already scoring 97% across 15 unified entrance exam subjects, achieving nine perfect scores. The April 2026 results confirm that trajectory has continued upward.
The generative AI chatbot scored 50 points higher than the top test-taker on the University of Tokyo's most competitive Natural Sciences III medical track exam — this isn't a marginal pass. It's a dominant performance that puts the AI in a different category from the best human applicants.
Where AI Still Struggles
The source article glossed over this entirely, but the weakness is just as interesting as the strengths. The AI scored 90% on the English exam but only 25% on essay-style questions in subjects like World History. The gap between structured problem-solving — where AI now dominates — and open-ended argumentation remains significant. Essay responses were graded by cram school teachers rather than automated systems, which adds a layer of human judgment that AI clearly hasn't conquered yet.
Mathematics, chemistry, physics, informatics — structured subjects with correct answers — are where AI is already superhuman. Synthesis, argument, and interpretation are where the gap closes.
It Wasn't Just ChatGPT
Google's Gemini 3 Pro Preview and Anthropic's Claude Opus 4.5 were also tested, representing each company's most advanced offerings. Google also earned top-level results across all University of Tokyo tracks, with three perfect mathematics scores. Claude exceeded passing thresholds in all tracks including Natural Sciences III and Kyoto's Faculty of Medicine, with a perfect score in the Physics department.
This isn't one AI having a good day. It's a systematic shift in what AI can do against elite academic benchmarks.
What This Actually Means
Around 500,000 students sit Japan's unified university entrance exams each year. The stakes are enormous. AI now outperforms every one of them in measurable terms on structured subjects. That forces a genuine question about what these exams measure — and whether the answer remains meaningful.
LifePrompt's Satoshi Endo noted that AI still can't adequately process Japanese-language information for essay writing. The education implications are real, but they're not simple.