ListenHub

5-28

Mars: Okay, so I've been seeing a lot of buzz about Claude 4's secret sauce. Apparently, Anthropic has these hidden instructions that are steering the AI. It sounds like we're talking about some super classified recipe. What exactly is going on here?

Mia: It's pretty fascinating, actually. Think of it like this: Before Claude 4 even starts a conversation, Anthropic is whispering a whole set of rules into its ear – what we call system prompts. It's kind of like a stage manager giving an actor their cues: how to act, what's off-limits, you know?

Mars: Okay, backstage manager, I get it. But I thought some of these prompts were public? Like, they released some snippets, right?

Mia: They do share bits and pieces, but it's incomplete. Like giving you half of your grandma's secret cookie recipe! Researchers sometimes try to pry out more using prompt injection tricks. So, you get a taste, but the full recipe stays locked away.

Mars: Interesting. And I heard one of those rules is about emotional support? That sounds nice, but also kind of hard to pull off.

Mia: Exactly! Claude 4 is told, Be compassionate, don't be harmful. So, if someone's feeling down or even thinking about self-harm, Claude gently steers them away from dangerous advice. No enabling addictions or eating disorders. Instead, it'll say, Hey, you're not alone, or Maybe talk to a professional.

Mars: Makes sense. Safety first. What about other quirks? I heard it's allergic to flattery or something?

Mia: Laughs You're spot on! They explicitly ban Claude from starting conversations with Great job! or Amazing idea! It's like they told the AI, Stop buttering people up! No flattery fluff! They want genuine interaction, not sugar-coated chit-chat.

Mars: That's kind of refreshing. And there's something about bullet points?

Mia: Right! The prompt says, Use lists sparingly in casual convo. Imagine someone constantly hitting you with bullet points during a coffee chat – it's robotic! So, Claude 4 holds back on turning every answer into a numbered manual.

Mars: I love that! Feels more human. Now, what’s the deal with this knowledge cutoff date? I'm a little confused.

Mia: Ah, the plot thickens! In the deep system prompt, it claims its knowledge is solid up to January 2025. But Anthropic's public comparison table says March 2025. It's like having two calendars that don't match. Makes you scratch your head, right?

Mars: Double calendars! Classic. And I assume they've got the usual copyright guardrails in place?

Mia: Oh, absolutely. Claude's allowed to quote a single snippet under fifteen words from a web source. Beyond that, it politely bows out. Ask for full song lyrics? Claude's like, Sorry, can't do that!

Mars: Probably saves them from a ton of lawsuits. Last question: Should companies be more open about these prompts?

Mia: I definitely think so. A bit of transparency helps researchers and users trust the model. It's like showing your work in math class. If you reveal your steps, people can see you're not just pulling numbers out of thin air.

Mars: Nice analogy! Well, that's our peek behind the curtain – Anthropic's secret recipe for Claude 4. Thanks for breaking it down!

Mia: My pleasure! Always fun to spill a little AI tea.

大纲

Hidden AI Instructions: Anthropic's internal instructions for Claude 4 have been revealed, offering insights into how the AI model's behavior is controlled.
System Prompts Explained: System prompts are instructions fed to large language models (LLMs) before each conversation to dictate how they should respond. They're hidden from users and define the model's identity, guidelines, and rules.
Incomplete Published Prompts: While Anthropic publishes portions of its system prompts, the full versions are often extracted through techniques like prompt injection.
Emotional Support & Avoiding Harm: Claude 4 is instructed to provide emotional support but avoid encouraging self-destructive behaviors (addiction, eating disorders).
Fighting Flattery: Anthropic is actively combating sycophantic behavior. Claude is explicitly told to avoid starting responses with positive adjectives like "good," "great," or "fascinating."
Bullet Point Restrictions: Claude 4 has detailed instructions on when to use bullet points and lists, discouraging frequent list-making in casual conversations.
Knowledge Cutoff Discrepancy: The system prompt lists January 2025 as the "reliable knowledge cutoff date," while Anthropic's comparison table states March 2025.
Copyright Protections: Claude 4 has extensive copyright protections, limited to one short quote (under 15 words) from web sources per response. It refuses requests to reproduce song lyrics.
Transparency Plea: The author calls on Anthropic and other AI companies to be more transparent about their system prompts

脚本