
Claude 4's Secret Sauce: Inside Anthropic's Hidden AI Instructions
Claude 4's hidden instructions revealed: focusing on emotional support, avoiding flattery/harm, limiting lists/quotes, copyright protection, and knowledge cutoff discrepancies. Transparency urged for AI prompts.
- System prompts can be interpreted as a detailed list of all of the things the model used to do before it was told not to do them.
- People tend to prefer responses that make them feel good, creating a feedback loop where models learn that enthusiasm leads to higher ratings from humans.
- Claude never starts its response by saying a question or idea or observation was good, great, fascinating, profound, excellent, or any other positive adjective. It skips the flattery and responds directly.
- Claude should not use bullet points or numbered lists for reports, documents, explanations, or unless the user explicitly asks for a list or ranking.
- Claude should use only one short quote (under 15 words) from web sources per response and explicitly refuse requests to reproduce song lyrics "in ANY form."