
Claude 4's Secret Sauce: Inside Anthropic's Hidden AI Instructions
Claude 4's hidden instructions revealed: focusing on emotional support, avoiding flattery/harm, limiting lists/quotes, copyright protection, and knowledge cutoff discrepancies. Transparency urged for AI prompts.
-
Hidden AI Instructions: Anthropic's internal instructions for Claude 4 have been revealed, offering insights into how the AI model's behavior is controlled.
-
System Prompts Explained: System prompts are instructions fed to large language models (LLMs) before each conversation to dictate how they should respond. They're hidden from users and define the model's identity, guidelines, and rules.
-
Incomplete Published Prompts: While Anthropic publishes portions of its system prompts, the full versions are often extracted through techniques like prompt injection.
-
Emotional Support & Avoiding Harm: Claude 4 is instructed to provide emotional support but avoid encouraging self-destructive behaviors (addiction, eating disorders).
-
Fighting Flattery: Anthropic is actively combating sycophantic behavior. Claude is explicitly told to avoid starting responses with positive adjectives like "good," "great," or "fascinating."
-
Bullet Point Restrictions: Claude 4 has detailed instructions on when to use bullet points and lists, discouraging frequent list-making in casual conversations.
-
Knowledge Cutoff Discrepancy: The system prompt lists January 2025 as the "reliable knowledge cutoff date," while Anthropic's comparison table states March 2025.
-
Copyright Protections: Claude 4 has extensive copyright protections, limited to one short quote (under 15 words) from web sources per response. It refuses requests to reproduce song lyrics.
-
Transparency Plea: The author calls on Anthropic and other AI companies to be more transparent about their system prompts