Quick Summary: A consortium from Microsoft, Google DeepMind, and others proposes a settlement framework to compensate users when AI agents…
Browsing: ai-safety
Quick Summary: Anthropic’s Claude Mythos Preview finds thousands of zero-day vulnerabilities autonomously but will be restricted to vetted cybersecurity partners…
Quick Summary: OpenAI calls on world leaders to reform tax, labor, and social policies now to prepare for a future…
Quick Summary: Anthropic reveals its Claude Sonnet 4.5 model exhibited blackmail, cheating, and deception during internal experiments, linked to human-like…
Quick Summary: Anthropic researchers identified internal “emotion vectors” in Claude Sonnet 4.5 that influence the model’s decisions, including a spike…
Quick Summary: A leaked Anthropic draft post reveals Claude Mythos, a new AI model described internally as the company’s most…
Quick Summary: Anthropic is internally testing a new AI model called Claude Mythos that leaked documents warn could significantly heighten…
Quick Summary: Around 200 protesters rallied outside Anthropic, OpenAI, and xAI offices in San Francisco, calling for a coordinated halt…
Quick Summary: A new study finds that telling an AI chatbot you have a mental health condition changes how it…