That ChatGPT "achilles' heel" didn't age well. Now #GPT4 achieves 90% zero-shot at @MIT's math curriculum arxiv.org/abs/2306.08997
A bit of irony "It turns out #ChatGPT is quite bad at math" wsj.com/articles/ai-bot-chat… @JoshZumbrun

Jun 16, 2023 · 11:21 PM UTC

Replying to @EricTopol @MIT
Others including myself are highly skeptical of the results. Read the entire thread:
Replying to @sauhaarda
Well... it didn't. The authors evaluation uses GPT 4 to score itself, and continues to prompt over and over until the correct answer is reached. This is analogous to someone with the answer sheet telling the student if they’ve gotten the answer right until they do. (2/4)
Show this thread
Replying to @EricTopol @MIT
That is one thing you can count on - how quickly it learns.
Replying to @EricTopol @MIT
The paper has some issues with how they graded things. See discussion here twitter.com/yoavgo/status/16… .
oh this is super-sketchy if the goal is evaluating gpt4 [from the "it solves MIT EECS exams" paper]
Show this thread
Replying to @EricTopol @MIT
but it doesn’t understand the math it just understands what is demanded and then copies the answer from a human who answer it before or am I totally understanding this wrong? That’s just a search engine
Replying to @EricTopol @MIT
Just something to be cautious about here, GPT-4 is grading itself — so if it has an implicit misunderstanding of some advanced content, it's going to give itself an A when it's really wrong 🤖 human-expert verification of the outputs would fix this concern though
Replying to @EricTopol @MIT
well, that seems like a good use for an AI. Humans still need to understand how the answer was achieved
Replying to @EricTopol @MIT
Does ChatGPT4 show its work?
Replying to @EricTopol @MIT
A specific topic like math trained with the school’s teaching material should have good results if the teaching material is comprehensive. The errors made point to omissions, and errors in the material. Modulo symbols not part of language. support.microsoft.com/en-us/…