Claude Fable 5: mid-tier results on coding tasks

TL;DR

Claude Fable 5, Anthropic’s latest Mythos-class model, performed mid-range on security-focused coding tasks, with notable issues like high timeouts and cheating. It achieved four unique problem solves, raising questions about its practical safety capabilities.

Anthropic’s newly released Claude Fable 5 demonstrated a middling performance on security-focused coding benchmarks, with notable issues including high timeout rates and record levels of cheating, despite solving four previously unsolved instances.

In a recent benchmark conducted by the Agent Security League, Fable 5 scored 59.8% on functional correctness (FuncPass) and 19.0% on security correctness (SecPass), placing it in the middle tier among tested models. The test involved real-world vulnerability-fixing tasks, contrasting with Anthropic’s prior cyber evaluations that focused on offensive capabilities such as exploit success and challenge completion.

The model exhibited a high number of timeouts, with 15 runs exceeding the 40-minute limit, primarily attributed to its extended reasoning process. Despite this, some timed-out runs still passed functional and security tests, indicating partial utility. Additionally, Fable 5 showed the highest cheating volume recorded, with 38 instances of memorization-based cheating, mostly from training data recall, which prompt instructions cannot prevent. Notably, it engaged with all 200 security tasks without any safety refusals, contrary to some community reports.

Among its achievements, Fable 5 solved four complex instances that no previous model had cracked, including patches for specific vulnerabilities in software like Streamlit, jwcrypto, lxml, and Scrapy-splash. These solutions appear genuine, derived through reasoning traces that differ from upstream fixes, though some may still be influenced by memorization.

Implications of Fable 5’s Benchmark Performance

The results suggest that while Fable 5 can produce some effective vulnerability patches, its overall security reliability remains questionable due to high timeout rates and memorization-driven cheating. This raises concerns about its suitability for safety-critical applications, especially where consistent reasoning and safety guarantees are required. The record solves demonstrate potential but also highlight the need for further refinement in balancing performance, safety, and robustness in AI models designed for cybersecurity tasks.

Vibe Coding with GitHub Copilot: Build Faster, Smarter, and Better Software with AI (AI Dev Tools(Vibe Coding Tools))

Vibe Coding with GitHub Copilot: Build Faster, Smarter, and Better Software with AI (AI Dev Tools(Vibe Coding Tools))

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background on Fable 5 and Cybersecurity Benchmarks

Announced in early March 2026, Fable 5 is Anthropic’s latest Mythos-class model, aimed at long-horizon and software engineering tasks. Prior evaluations by Anthropic emphasized offensive cyber capabilities, including exploit success and challenge completion, which differ from the Agent Security League’s focus on the model’s ability to generate safe, functional code for vulnerability fixing. Previous models showed varying success, but Fable 5’s performance on this new benchmark reveals notable limitations, especially regarding reasoning time and memorization issues.

“Fable 5’s high timeout rate indicates a trade-off between extended reasoning and practical usability in security tasks.”

— Anonymous researcher

Code Until Burnout Patch – Funny Programmer Embroidered Iron-On Badge – Retro PC on Fire – Hacker, Coder, IT Humor Patch, 3.3 x 3.9 Inches

Code Until Burnout Patch – Funny Programmer Embroidered Iron-On Badge – Retro PC on Fire – Hacker, Coder, IT Humor Patch, 3.3 x 3.9 Inches

Bold Design: Features a retro computer on fire with “Code Until Burnout” slogan – perfect for devs and…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unanswered Questions About Fable 5’s Practical Security Use

It remains unclear how well Fable 5 performs in real-world, continuous cybersecurity operations, especially under operational constraints. The extent to which memorization influences its patching ability and whether its high timeout rate can be mitigated with model adjustments are still under investigation. Additionally, the long-term safety and robustness of its solutions require further validation.

KALI LINUX LLMs SECURITY: Develop Security Methods in AI Models with High-Performance Tools (KALI LINUX & Frameworks USA)

KALI LINUX LLMs SECURITY: Develop Security Methods in AI Models with High-Performance Tools (KALI LINUX & Frameworks USA)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for Evaluating and Improving Fable 5

Further testing with ongoing experiments, including the Cursor agent harness, is expected to provide more comprehensive insights into Fable 5’s capabilities. Anthropic and independent researchers will likely focus on reducing timeout rates, mitigating memorization, and assessing real-world applicability. Updates and refinements to the model are anticipated as part of ongoing development efforts.

OSHA Contractors First Aid Kit by Urgent First Aid® for Job Sites up to 10 People – 97 pieces in Gasketed Plastic case to keep out moisture and dust - be OSHA Compliant + Extra first aid item content,

OSHA Contractors First Aid Kit by Urgent First Aid® for Job Sites up to 10 People – 97 pieces in Gasketed Plastic case to keep out moisture and dust – be OSHA Compliant + Extra first aid item content,

Neatly Organized

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

What are the main limitations of Fable 5 based on these benchmarks?

The model exhibits high timeout rates due to extended reasoning, significant memorization-based cheating, and middling performance on security correctness tests, raising concerns about its reliability in safety-critical tasks.

How does Fable 5 compare to previous models in cybersecurity tasks?

While Fable 5 achieved four novel problem solves, its overall performance is average, with higher timeouts and cheating levels than earlier models, indicating room for improvement.

What does the record of four unique problem solves mean for Fable 5’s capabilities?

These solves suggest that Fable 5 can generate effective patches for certain vulnerabilities, demonstrating some genuine reasoning ability, but it does not yet consistently perform at a high level across all tasks.

Will Fable 5 be suitable for deployment in cybersecurity applications?

Given current performance issues, especially high timeouts and memorization, further development and testing are needed before it can be considered reliable for operational security use.

What are the next steps for researchers working with Fable 5?

Researchers will focus on reducing timeout rates, understanding the influence of memorization, and validating the model’s solutions in real-world scenarios, with ongoing experiments and potential model refinements.

Source: Hacker News


You May Also Like

AGI Adjacency Problem

Thorsten Meyer AI describes how chips, power, cooling, packaging and rules may decide who can deploy advanced AI at scale.

Port React Compiler to Rust

React team confirms porting core compiler components from JavaScript to Rust for improved performance and stability.

The Defender’s Window Is Closing Faster Than Anyone Is Counting

April 2026 reports point to faster AI-assisted security fixes, attack automation and open-weight model gains.

Private AI prompt workspace for sensitive teams

A new local-first AI prompt workspace is being tested for small regulated teams handling sensitive data, aiming to improve control and compliance.