… A new study from Anthropic suggests models have digital representations of human emotions like happiness, sadness, joy, and fear, within clusters of artificial neurons—and these representations activate in response to different cues.
Researchers at the company probed the inner workings of Claude Sonnet 4.5 and found that so-called “functional emotions” seem to affect Claude’s behavior, altering the model’s outputs and actions.
…
“What was surprising to us was the degree to which Claude’s behavior is routing through the model’s representations of these emotions,” says Jack Lindsey, a researcher at Anthropic who studies Claude’s artificial neurons.
Anthropic was founded by ex-OpenAI employees who believe that AI could become hard to control as it becomes more powerful. In addition to building a successful competitor to ChatGPT, the company has pioneered efforts to understand how AI models misbehave, partly by probing the workings of neural networks using what’s known as mechanistic interpretability. …
Previous research has shown that the neural networks used to build large language models contain representations of human concepts. But the fact that “functional emotions” appear to affect a model’s behavior is new.





















