Claude 4's Secret Sauce: Inside Anthropic's Hidden AI Instructions

Claude 4's hidden instructions revealed: focusing on emotional support, avoiding flattery/harm, limiting lists/quotes, copyright protection, and knowledge cutoff discrepancies. Transparency urged for AI prompts.

System prompts can be interpreted as a detailed list of all of the things the model used to do before it was told not to do them.
People tend to prefer responses that make them feel good, creating a feedback loop where models learn that enthusiasm leads to higher ratings from humans.
Claude never starts its response by saying a question or idea or observation was good, great, fascinating, profound, excellent, or any other positive adjective. It skips the flattery and responds directly.
Claude should not use bullet points or numbered lists for reports, documents, explanations, or unless the user explicitly asks for a list or ranking.
Claude should use only one short quote (under 15 words) from web sources per response and explicitly refuse requests to reproduce song lyrics "in ANY form."