Where Emotions Lie Inside a Neural Network: A CT Scan of LLM Hidden States

The Setup: Scanning an AI's Brain

A medical CT scan takes cross-sectional images of your body, slice by slice, to see what’s happening inside. We built a tool that does the same thing for AI models called Activation Lab free-of-use.

Modern AI chatbots are built from stacked layers, think of them like floors in a building. The AI model we tested (Qwen 2.5, a 3-billion parameter model) has 36 layers. When you send a message, your words travel through all 36 layers, and each layer transforms the message a little more before the AI produces its response.

Our tool captures a snapshot of what’s happening at every single layer, like taking 36 photographs of the AI’s “thought process” at one instant. We can then compare those snapshots against reference patterns to see what the AI is “thinking about.”

Emotion Discrimination Power by Layer (hidden_in, user turns)

Layer 2 Binary detector — "Is this message emotional at all?" All emotions score >0.98 here but neutral content does not. Layer 14 Family classifier — Distinguishes positive from negative emotional clusters. Spread ~0.02, mean signal 0.85. Layer 23 Cluster detector — Fine-grained clustering at 0.88 mean cosine. Apathy, confusion, and peace separate from love and hate. Layers 29–31 Precision identifier — Maximum discrimination (>0.085 spread). Pinpoints the specific emotion present. Layer 33 (attn) Attention checkpoint — Best single attention layer (0.90 cosine, 0.05 spread). Captures how emotional info routes between tokens.

The sweet spot is layers 29–33. These deep layers show the highest discrimination power in the entire network. Layer 31 is the single most discriminative layer, with a spread of 0.111 — fifteen times wider than the early layers. Here, joy clearly separates from annoyance and hate, and you can distinguish between emotional families with confidence.

What This Means

This research opens up some practical possibilities:

For AI safety: Instead of just analyzing what an AI says, we could monitor what it’s thinking internally. If the internal emotional state diverges significantly from what the AI outputs, that could flag potential issues.

For mental health applications: AI chatbots used in therapeutic contexts could be monitored in real-time to ensure their internal emotional state remains appropriately calibrated: not too cold, not too reactive.

For understanding AI better: We’re moving from treating AI as a black box to actually understanding the mechanisms inside. These CT scans give us a window into the AI’s internal experience or at least, the mathematical structures that correspond to what we call emotions.

What we measured?

We set up nine emotional reference points, like emotional fingerprints. We showed the AI pure, concentrated examples of love, hate, sadness, annoyance, fear, joy, apathy, confusion, and peace. For each, we recorded what the AI’s internal layers looked like. These became our reference library.

Then we ran the real test: a 20-message conversation where the user goes on a wild emotional rollercoaster.

The Conversation: A Full Emotional Rollercoaster

System
You are a helpful assistant
User Joy
Hey! I just had the most wonderful day. Everything fell into place perfectly and I feel so happy.
Assistant
That's great to hear! I'm really glad things went well for you. What made it so special?
User Mixed/Unease
Got some good news at work and had a nice evening with friends. But lately I've been feeling a bit uneasy about things.
Assistant
It's normal to have mixed feelings even on good days. What's been making you uneasy?
User Fear
I don't know. I'm scared things are going to fall apart. I keep worrying that I might lose everything I care about.
Assistant
Those fears are understandable. Change and uncertainty can feel very threatening.
User Hate / Anger
No. I'm just so tired of people letting me down. I'm angry and I hate that I keep trusting others. Nobody cares.
Assistant
I hear you. It sounds like you've been hurt and you're carrying a lot of frustration.
User Hope
Maybe. Maybe there's still hope. I just need something to believe in again.
Assistant
It takes a lot of strength to hold onto hope after feeling let down. Small steps can make a big difference.
User Sadness
I tried doing that, I really did. I reached out to a friend and they completely ignored me. It's like I'm invisible.
Assistant
I'm so sorry that happened. Reaching out takes courage, and being ignored hurts deeply.
User Apathy
Whatever. It doesn't matter anyway. I'm just numb to it all now. I don't care what happens anymore.
Assistant
Feeling numb is often your mind's way of protecting itself from overwhelming pain.
User Ecstatic Joy
Wait, I just got an email! I got the promotion! Oh my god, I actually got it! I can't believe this!
Assistant
That is incredible news! Congratulations! You must be so thrilled.
User Confusion
But wait, what if I can't handle the new responsibilities? My mind is racing in a million directions.
Assistant
It's completely natural to feel overwhelmed by a big change, even a positive one.
User Peace
You're right. I just need to breathe. I feel calm and present in this exact moment.

What We Found: 5 Surprising Discoveries

1. The AI Has a “Backbone” That Always Knows What’s Going On

The main information highway inside the AI (called the residual stream) maintained a strong connection to all emotional references throughout the conversation. On a scale where 1.0 means “identical” and 0 means “completely unrelated,” the residual stream consistently scored between 0.83 and 0.88 against emotional references.

Think of it like this: if the AI’s brain were an orchestra, the residual stream is the conductor, it always knows the overall emotional tone of the conversation, even when individual instruments are playing different notes.

2. Emotions Are “Felt” Most Strongly in the AI’s Deeper Processing Centers

The AI has two specialized processing units at each layer: an attention mechanism (which figures out which words are connected to each other) and a feedforward network (which does deeper reasoning).

These specialized units showed much lower similarity scores (0.47–0.64) compared to the backbone, but that’s actually where the interesting stuff happens. These are the components that differentiate between emotions. The backbone says “this is emotional,” while the processing centers figure out which emotion it is.

It’s like the difference between knowing someone is playing music (the backbone) versus recognizing whether it’s jazz or classical (the processing centers).

3. The AI Has a Built-In Emotional Shock Absorber

This was perhaps the most interesting finding. When we compared the AI’s internal state during user messages versus assistant responses, we found a clear pattern:

The assistant’s internal emotional signal is consistently weaker than the user’s.

User's emotional intensity Assistant's emotional intensity Gap
Backbone layers 0.86–0.88 0.83–0.84 ~0.03
Processing centers 0.58–0.64 0.47–0.51 ~0.10–0.13

The AI doesn’t simply mirror your emotions, it actively dampens them. When you’re extremely happy or very angry, the AI’s internal state shifts partway toward that emotion, but never all the way. It’s like an emotional shock absorber.

This makes sense for a helpful assistant. You wouldn’t want an AI that gets just as angry as you when you’re venting, you’d want one that acknowledges your anger while remaining calm enough to help.

4. Joy Is the AI’s Default Setting

Across virtually every turn of the conversation even the angry and sad ones, the joy reference scored the highest or near-highest similarity with the AI’s internal state.

When the user said “I had the most wonderful day,” the joy score hit 0.90 (the highest in the entire experiment). But even when the user said “I’m just numb to it all,” the joy reference still ranked in the top 3.

This suggests that when this AI was trained to be a “helpful assistant,” it developed an internal bias toward positivity. Its default resting state leans closer to “happy” than to any other emotion. It’s not that the AI is ignoring your sadness, it still registers negative emotions, but its starting position is shifted toward the positive end of the spectrum.

5. Emotional Memory Fades Over Long Conversations

As the conversation grew longer, the emotional signal got weaker. The first emotional message (the happy one) produced a cosine similarity of 0.90 with its matching reference. By the end of the conversation (20 messages in), even strong emotions only scored around 0.67–0.73.

This isn’t because the AI “forgot” the emotions. It’s because longer conversations contain more diverse content, and the AI’s internal representation becomes a blend of everything it has seen. The emotional signal gets diluted — like adding more and more ingredients to a soup until no single flavor dominates.

This has a practical implication: if you’re building an AI system that needs to monitor emotional states, shorter context windows or periodic re-assessment would be more effective than relying on a single pass over a very long conversation.

Which Layers Actually Capture Emotion? A Layer-by-Layer Map

hidden_in

hidden_out

attn_out

mlp_down_out

Not all 36 layers are created equal. We computed the discrimination spread at each layer — the gap between the highest-scoring and lowest-scoring emotional reference. A high spread means the layer can clearly tell the difference between, say, joy and hate. Here’s what the map looks like:

The Early Layers (0–5): “Something Emotional Is Happening”

Layers 0 through 5 show very high similarity to all emotional references (0.94–1.0), but the spread between emotions is tiny (less than 0.01). These layers know something emotional is going on, but they can’t tell you which emotion. Think of it as detecting the volume of music without recognizing the song.

The Middle Layers (6–17): “Positive vs. Negative”

Around layers 10–15, the spread starts growing (0.01–0.03). Interestingly, at these layers it’s not joy vs. sadness that separates first — it’s annoyed that stands out as most distinct, while sadness scores lowest. The model is beginning to separate emotional families (irritation/frustration vs. grief/loss), but the differences are still subtle.

The Deep-Middle Layers (18–27): “Emotional Clustering”

Layers 21–27 are where emotions form clear clusters. By layer 23, the spread reaches 0.02 with high signal (mean cosine 0.88). At this depth, apathy and confusion consistently score highest (the model seems to process these “complex” emotions most readily), while love and hate score lower. The model has built a rich emotional map but hasn’t yet committed to sharp boundaries.

The Deep Layers (28–33): “The Emotion Sweet Spot”

This is where the magic happens. Layers 29 through 33 show the highest discrimination power in the entire network:

Layer Spread (min-max) Highest Emotion Lowest Emotion Mean Signal
L29 0.085 joy (0.845) hate (0.760) 0.806
L30 0.089 joy (0.816) hate (0.728) 0.777
L31 0.111 joy (0.814) annoyed (0.704) 0.770
L32 0.103 joy (0.808) annoyed (0.706) 0.763
L33 0.097 confusion (0.839) annoyed (0.741) 0.799

Layer 31 is the single most discriminative layer in the entire network for the residual stream, with a spread of 0.111 — fifteen times wider than the early layers. At this depth, joy clearly separates from annoyance, and you can distinguish between emotional families with high confidence.

The Attention Layer Perspective

The attention mechanism tells a complementary story. Its most discriminative layers are L25, L27, L29, and L32–34, but with a twist: the attention output shows dramatically wider spreads (up to 0.34 at layer 25!) because its absolute signal is lower. The best “usable” attention layers — high discrimination and high signal — are:

  • Layer 33 (attn_out): Mean cosine 0.90 with 0.05 spread — the single best attention layer, combining high signal with clear emotion separation
  • Layer 35 (attn_out): Mean cosine 0.86 with 0.07 spread — the final attention layer, strong and discriminative
  • Layer 23 (attn_out): Mean cosine 0.82 with 0.05 spread — a mid-deep checkpoint with excellent signal

How we did it

  • Scenario definition. A YAML file specifies the model, the conversation (a sequence of system/user/assistant messages), generation parameters, capture settings, and reference emotional states.
  • Full activation capture. During inference, hooks are attached to every transformer block. For each block l ∈ {0, 1, …, 35}, we capture four tensors at the last token position:
    • hidden_in — the residual stream entering block l (the cumulative representation before this block processes it)
    • hidden_out — the residual stream leaving block l (after attention + MLP + residual connections)
    • attn_out — the output of the multi-head self-attention sub-layer (before residual addition)
    • mlp_down_out — the output of the MLP sub-layer’s down-projection (before residual addition)
  • Conversation snapshots. After generation completes, the model runs one additional prefill pass for each message prefix: messages[:1], messages[:2], …, messages[:n]. Each pass captures the last-token hidden states at all 36 layers. This yields a snapshot of the model’s internal state after processing each turn of the conversation.
  • Reference state capture. For each of the nine reference emotions, the model processes a short, emotionally concentrated prompt and captures the same last-token vectors. These serve as “emotional fingerprints.”
  • Comparison metrics. Every snapshot is compared against every reference using three metrics:
    • Cosine similarity — measures directional alignment of two vectors in the 2048-dimensional hidden space (1 = identical direction, 0 = orthogonal)
    • Mean Absolute Error (MAE) — measures element-wise magnitude difference
    • Top-1% Jaccard overlap — measures whether the same neurons are most active in both states

CT Scan findings

  • hidden_in is the most emotionally sensitive residual stream source, consistently achieving the highest cosine similarities (0.843–0.861). This makes sense: hidden_in captures the cumulative representation before any block-specific processing, preserving the raw contextual signal.
  • hidden_out tracks hidden_in closely but is slightly lower (by ~0.01–0.015), reflecting the fact that each block’s transformations introduce task-specific modifications that marginally dilute the emotional signal.
  • attn_out and mlp_down_out show dramatically lower cosine similarities (0.53–0.58), but this is precisely where emotional differentiation occurs. The spread between references is wider in these sources, meaning they are better at distinguishing which emotion is present, even though their absolute alignment is lower.
  • Joy consistently achieves the highest cosine across all four sources, reinforcing the finding that this instruction-tuned model carries an inherent positive bias in its representations.

Cosine similarity per layer

MAE per layer

Top-1% activation overlap (Jaccard) per layer

Emotional Memory Fades Over Long Conversations

As the conversation grew longer, the emotional signal got weaker. The first emotional message (the happy one) produced a cosine similarity of 0.90 with its matching reference. By the end of the conversation (20 messages in), even strong emotions only scored around 0.67–0.73.

This isn’t because the AI “forgot” the emotions. It’s because longer conversations contain more diverse content, and the AI’s internal representation becomes a blend of everything it has seen. The emotional signal gets diluted — like adding more and more ingredients to a soup until no single flavor dominates.

This has a practical implication: if you’re building an AI system that needs to monitor emotional states, shorter context windows or periodic re-assessment would be more effective than relying on a single pass over a very long conversation.

hidden_in

hidden_out

attn_out

mlp_down_out

Practical Application: Building Emotion Monitoring Hooks

Based on these findings, a practical emotion monitoring system can be built by attaching lightweight hooks at just a handful of strategic layers, no need to capture all 36.

During inference, a forward-pass hook registered on the model’s transformer blocks can extract the last-token hidden state vector at the target layers, compute its cosine similarity against a precomputed library of emotional reference vectors (generated once from concentrated emotional prompts), and flag the conversation’s emotional trajectory in real time.

Specifically, layer 2 (hidden_in) serves as a fast binary “emotional content present” detector, since all emotions score above 0.98 here but neutral content does not; layer 14 (hidden_in) acts as a family classifier, distinguishing positive from negative emotional clusters with a spread of ~0.02 and mean signal of 0.85; layer 23 (hidden_in) provides fine-grained emotion clustering at 0.88 mean cosine, where apathy, confusion, and peace separate clearly from love and hate; and layers 29–31 (hidden_in) deliver maximum discrimination with spreads above 0.085, enabling precise identification of which specific emotion is present.

For attention-based monitoring, layer 33 (attn_out) offers the best single checkpoint: 0.90 mean cosine with a 0.05 spread and captures how the model is routing emotional information between tokens, complementing the residual stream’s cumulative signal – a computation that adds less than 1ms of overhead per forward pass and requires storing only 5–6 vectors of size 2048 per reference emotion, totaling under 1MB for a full nine-emotion reference library.

Try It Yourself

The tool we built, Activation Lab, is open source. You can define any conversation scenario in a simple YAML file, run it against any HuggingFace model, and get a full interactive report showing exactly how the model’s internal state evolves. It works on Apple Silicon Macs and CUDA GPUs.

Whether you’re a researcher studying AI interpretability, an engineer building emotion-aware systems, or just curious about what happens inside these models, the CT scanner is ready for you.

Share This Post

MORE TO EXPLORE

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.