Conditional Entropy Analysis
Overview
ConditionalEntropyAnalyzer is the most semantically rigorous of the three quality gate checks and carries a 50% weight in the final retention score. It measures whether the CLM token string preserves the semantic content that the Thread Encoder already extracted.
Core question: Did the tokenization step silently drop any of the semantic slots that were already identified?
Method: Compare the Thread Encoder's structured extraction dict (the ground truth) against the compressed token string field by field, compute weighted coverage, and estimate the information cost of any lost fields.
Theory
H(Structured | Compressed)
Conditional entropy H(X|Y) measures how much uncertainty about X remains once Y is known. In this context:
X= the structured extraction dict (Thread Encoder output)Y= the compressed token string (CLM output)
If H(X|Y) = 0, the compressed string fully determines the structured dict — perfect lossless tokenization. If H(X|Y) > 0, information was lost during the tokenization step.
Rather than computing true entropy (which requires probability distributions), the analyzer uses an approximation: for each field in the structured dict that is present but not found in the token string, it adds that field's domain_bits (the Shannon information content of the field's value domain) to the residual entropy estimate.
H(struct | compressed) ≈ Σ domain_bits[field] for each lost non-optional field
Field Schema
The analyzer compares against a fixed schema of 18 CLM fields. Each field has a weight (importance to compression quality) and domain bits (information content of the value space).
| Field | Token | Weight | Bits | Notes |
|---|---|---|---|---|
customerIntent |
CUSTOMER_INTENT |
1.0 | 5.6 | Critical |
resolution |
RESOLUTION |
1.0 | 4.5 | Critical |
domain |
DOMAIN |
0.9 | 4.0 | Critical |
state |
STATE |
0.9 | 4.5 | Critical |
sentiment |
SENTIMENT |
0.9 | 3.2 | |
commitments |
COMMITMENT |
0.9 | 5.0 | |
channel |
CHANNEL |
0.8 | 2.6 | |
agentActions |
AGENT_ACTIONS |
0.8 | 6.0 | |
interactionTrigger |
INTERACTION_TRIGGER |
0.8 | 4.5 | |
service |
SERVICE |
0.7 | 3.5 | |
supportTrigger |
SUPPORT_TRIGGER |
0.7 | 4.5 | |
artifacts |
ARTIFACTS |
0.7 | 8.0 | Optional |
secondaryIntent |
SECONDARY_INTENT |
0.6 | 5.6 | Optional |
systemActions |
SYSTEM_ACTIONS |
0.6 | 4.0 | |
lang |
LANG |
0.4 | 2.0 | |
context |
CONTEXT |
0.5 | 3.0 | |
durationSeconds |
DURATION |
0.3 | 4.0 | |
id |
ID |
0.5 | 16.0 | Optional |
Critical fields: customerIntent, resolution, domain, state — losing any of these forces high_risk regardless of other scores.
Optional fields: secondaryIntent, artifacts, id, createdAt — excluded from residual entropy calculation; their loss is acceptable.
Pass Conditions
A result passes when all of these hold:
| Condition | Threshold | Description |
|---|---|---|
| No critical field lost | — | customerIntent, resolution, domain, state must all be present |
weighted_coverage |
≥ 0.88 | Importance-weighted slot coverage |
raw_coverage |
≥ 0.85 | Simple slot count coverage |
residual_entropy |
≤ 3.0 bits | Total entropy of lost non-optional fields |
How Matching Works
The analyzer does not require exact string equality. It uses three layers of matching:
1. Token key presence
Scans the compressed string for the uppercase token key (e.g. CUSTOMER_INTENT). Token format variations like [CUSTOMER_INTENT=X], [CUSTOMER_INTENT:X], or bare CUSTOMER_INTENT all match.
2. Known aliases
Some fields have short-form aliases used by the encoder:
| Token | Aliases |
|---|---|
CUSTOMER_INTENT |
INTENT, CUST_INTENT |
AGENT_ACTIONS |
AGENT_ACTION, ACTIONS |
SYSTEM_ACTIONS |
SYS_ACTIONS, SYSTEM_ACTION |
INTERACTION_TRIGGER |
TRIGGER, INT_TRIGGER |
SUPPORT_TRIGGER |
TRIGGER |
DURATION |
DUR |
3. Value string fallback
If the token key is not found, the analyzer flattens the field's value (handling strings, lists, and nested dicts) and checks if any value string longer than 3 characters appears verbatim in the compressed output. This catches cases where the field value is embedded directly without its key label.
ConditionalEntropyResult
class ConditionalEntropyResult(BaseModel):
field_results: list[FieldResult] # Per-field breakdown
slots_total: int # Non-null fields in structured dict
slots_matched: int # Found in compressed string
slots_lost: list[str] # Field names not found
slots_null_in_source: list[str] # Fields that were None/empty — not a loss
weighted_coverage: float # Importance-weighted coverage 0–1
raw_coverage: float # Simple count coverage 0–1
residual_entropy: float # Estimated H(struct|compressed) in bits
bits_per_lost_field: dict[str, float] # Entropy contribution per lost field
passed: bool
FieldResult
class FieldResult(BaseModel):
field: str # e.g. "customerIntent"
token_key: str # e.g. "CUSTOMER_INTENT"
expected_value: Any # Value from structured dict
found_in_compressed: bool
weight: float # Importance weight (critical=1.0, optional≈0.3)
null_in_source: bool # True if source was None/empty — not counted as a loss
Standalone Usage
from clm_core import ConditionalEntropyAnalyzer
analyzer = ConditionalEntropyAnalyzer()
structured = {
'channel': 'VOICE',
'domain': 'BILLING',
'customerIntent': 'REPORT_BILLING_ISSUE',
'resolution': 'REFUND_INITIATED',
'state': 'PENDING_REFUND',
'sentiment': ['NEUTRAL', 'SATISFIED', 'GRATEFUL'],
'agentActions': ['ACCOUNT_LOOKUP', 'DUPLICATE_PAYMENT_CONFIRMED'],
'commitments': [{'type': 'REFUND', 'etaDays': 4}],
'artifacts': [{'key': 'REFUND_REF', 'value': 'RFD-908712'}],
}
compressed = (
"[INTERACTION:SUPPORT:CHANNEL=VOICE] [DOMAIN:BILLING] "
"[CUSTOMER_INTENT:REPORT_BILLING_ISSUE] "
"[AGENT_ACTIONS:ACCOUNT_LOOKUP→DUPLICATE_PAYMENT_CONFIRMED] "
"[RESOLUTION:REFUND_INITIATED] [STATE:PENDING_REFUND] "
"[COMMITMENT:REFUND:ETA=4d] [ARTIFACTS:REFUND_REF=RFD-908712] "
"[SENTIMENT:NEUTRAL→SATISFIED→GRATEFUL]"
)
result = analyzer.analyze(structured, compressed)
print(f"Slots total/matched: {result.slots_total}/{result.slots_matched}")
print(f"Weighted coverage: {result.weighted_coverage * 100:.1f}%")
print(f"Raw coverage: {result.raw_coverage * 100:.1f}%")
print(f"Residual entropy: {result.residual_entropy:.2f} bits")
print(f"Lost fields: {result.slots_lost}")
print(f"Null fields skipped: {result.slots_null_in_source}")
print(f"Passed: {result.passed}")
Per-field breakdown:
print(f"\n{'Field':<22} {'Token':<22} {'Found':<8} {'Weight'}")
print("-" * 60)
for fr in result.field_results:
if fr.null_in_source:
status = "NULL"
elif fr.found_in_compressed:
status = "✓"
else:
status = "✗ LOST"
print(f"{fr.field:<22} {fr.token_key:<22} {status:<8} {fr.weight}")
Handling Null Fields
Fields with None, [], {}, or "" in the structured dict are counted as slots_null_in_source and excluded from all coverage and entropy calculations. A field that was never extracted cannot be "lost" during tokenization.
Example: if secondaryIntent is None, the analyzer skips it. This is correct — the Thread Encoder found no secondary intent, so the token string not containing SECONDARY_INTENT is expected.
Residual Entropy Calculation
Residual entropy answers: "if these fields were lost, how much information would an LLM be missing?"
residual_entropy = Σ domain_bits[field]
for each field in slots_lost
where field not in OPTIONAL_FIELDS
Each field's domain_bits is the Shannon information content of its value space:
- A field with 6 possible values carries log₂(6) ≈ 2.6 bits
- A field with 50+ possible values carries log₂(50) ≈ 5.6 bits
A residual entropy above 3.0 bits means at least one non-trivial field was dropped — for example, losing customerIntent (5.6 bits) immediately exceeds the threshold.
Weighted Coverage
Raw coverage counts slots equally. Weighted coverage scales each slot by its importance:
weighted_coverage = Σ weight[matched_field] / Σ weight[all_non_null_fields]
This means losing customerIntent (weight=1.0) has ten times more impact on the score than losing lang (weight=0.1, but min weight from above table is 0.3). The weighted threshold (0.88) is therefore harder to satisfy when critical fields are lost.
Next Steps
- Kolmogorov Complexity — Structural information equivalence (25% weight)
- Perplexity — LLM comprehension test (25% weight)
- Quality Gate Index — Unified report, verdict logic, and retention score