Willison: “No model has beaten GPT-4 on a range of widely used benchmarks like this.”