Notes on Using AI Agents for Work¶

These are my notes on this article from Ethan Mollick: "Real AI Agents and Real Work"

OpenAI Created a new test for AI ability where they asked experts with +14 years of experience to design tasks that would take a human expert 4 to 7 hours to complete
Human experts won (barely), however surprinsingly, the reason for most AI failures was formatting and instruction following issues rather than hallucinations and errors.
AI can't replace a job if it can't replace the complexity involved with human interactions as a whole
Really interesting use of Claude 4.5 Sonnet to replicate results of papers given the relevant data
small improvements in a model’s accuracy on single steps can add up to huge gains when tasks require many steps in sequence.
Interesting benchmark I didn't know about METR measures AI's ability to complete long tasks
I like this point Ethan makes about the importance of reflecting on why we do work and what it should look like to avoid drowning in a sea of AI generated content
Simple workflow suggested by OpenAI is:
- Delegate to AI as a first pass
- review
- if not good enough, try a few attempts at correcting & instructing
- If still not good enough, do the work yourself
Estimation is that this simple strategy could yield a 40% increase in speed and make work 60% cheaper (although not sure how they measured the later)
Having appropriate judgement about how we use AI and what we choose to use it for can lead to it making us more cabaple reather than just more productive