Notes on Using AI Agents for Work¶
These are my notes on this article from Ethan Mollick: "Real AI Agents and Real Work"
- OpenAI Created a new test for AI ability where they asked experts with +14 years of experience to design tasks that would take a human expert 4 to 7 hours to complete
- Human experts won (barely), however surprinsingly, the reason for most AI failures was formatting and instruction following issues rather than hallucinations and errors.
- AI can't replace a job if it can't replace the complexity involved with human interactions as a whole
- Really interesting use of Claude 4.5 Sonnet to replicate results of papers given the relevant data
- small improvements in a model’s accuracy on single steps can add up to huge gains when tasks require many steps in sequence.
- Interesting benchmark I didn't know about METR measures AI's ability to complete long tasks
- I like this point Ethan makes about the importance of reflecting on why we do work and what it should look like to avoid drowning in a sea of AI generated content
- Simple workflow suggested by OpenAI is:
- Delegate to AI as a first pass
- review
- if not good enough, try a few attempts at correcting & instructing
- If still not good enough, do the work yourself
- Estimation is that this simple strategy could yield a 40% increase in speed and make work 60% cheaper (although not sure how they measured the later)
- Having appropriate judgement about how we use AI and what we choose to use it for can lead to it making us more cabaple reather than just more productive