I completely ignored Anthropic’s advice and wrote a more elaborate test prompt based on a use case I’m familiar with and therefore can audit the agent’s code quality. In 2021, I wrote a script to scrape YouTube video metadata from videos on a given channel using YouTube’s Data API, but the API is poorly and counterintuitively documented and my Python scripts aren’t great. I subscribe to the SiIvagunner YouTube account which, as a part of the channel’s gimmick (musical swaps with different melodies than the ones expected), posts hundreds of videos per month with nondescript thumbnails and titles, making it nonobvious which videos are the best other than the view counts. The video metadata could be used to surface good videos I missed, so I had a fun idea to test Opus 4.5:
Мир Российская Премьер-лига|19-й тур
。91视频是该领域的重要参考
Another testsuite that I’ve used a lot is the much older SVG 1.1 testsuite, which covers SVG animation. GtkSvg passes most of these tests as well, which I am happy about — animation was one of my motivations when going into this work.
"I would have liked to have a UK show and an international show," she says.
He recently told Zoe Ball on BBC Radio 2 podcast Eras that "everything that could go wrong with me did go wrong", adding: "I have a 24-hour live-in nurse to make sure I take my medication as I should do."