Quality Gate

Overview

The CLM Quality Gate validates that compressed token strings produced by the Thread Encoder preserve the semantic meaning of the original transcript. Compression that silently drops critical information is worse than no compression at all — the Quality Gate catches this before it reaches production.

Core question: Did the tokenization step lose any meaning that matters?

Method: Three independent entropy analyses, each targeting a different failure mode:

Analyzer	Method	Catches
Kolmogorov	zlib compression ratio	Structural simplification gone too far
Conditional Entropy	Slot-level structured comparison	Silent semantic field loss
Perplexity	LLM comprehension test via API	LLM understanding degradation

Quick Start

Offline / CI mode (no API calls)

from clm_core import CompressionQualityGate

gate = CompressionQualityGate()

report = gate.analyze(
    original=transcript_text,
    compressed=clm_token_string,
    structured=thread_encoder_output_dict,  # optional but recommended
    run_perplexity=False,                   # skip LLM call — perplexity gets synthetic perfect score
)

print(report.verdict)          # "lossless", "acceptable", or "high_risk"
print(report.retention_score)  # 0–100
print(report.summary())        # full human-readable breakdown

With LLM perplexity analysis

from clm_core import CompressionQualityGate, PerplexityConfig

cfg = PerplexityConfig(
    llm_model="claude-haiku-4-5-20251001",
    api_key="sk-ant-...",
    host_url="https://api.anthropic.com",
)

gate = CompressionQualityGate(llm_client="anthropic", perplexity_cfg=cfg)

report = gate.analyze(
    original=transcript_text,
    compressed=clm_token_string,
    structured=thread_encoder_output_dict,
    run_perplexity=True,
)

print(report.verdict)
print(report.retention_score)
print(report.summary())

CompressionQualityGate

Single entry point that orchestrates all three analyzers.

from clm_core import CompressionQualityGate, PerplexityConfig

# Offline (Kolmogorov + Conditional Entropy only)
gate = CompressionQualityGate()

# With Anthropic perplexity
gate = CompressionQualityGate(
    llm_client="anthropic",
    perplexity_cfg=PerplexityConfig(
        llm_model="claude-haiku-4-5-20251001",
        api_key="sk-ant-...",
        host_url="https://api.anthropic.com",
    ),
)

# With OpenAI perplexity
gate = CompressionQualityGate(
    llm_client="openai",
    perplexity_cfg=PerplexityConfig(
        llm_model="gpt-4o-mini",
        api_key="sk-...",
        host_url="https://api.openai.com/v1",
    ),
)

Constructor parameters:

Parameter	Type	Default	Description
`llm_client`	`"anthropic" \\| "openai" \\| None`	`None`	LLM backend for perplexity analysis. If `None`, perplexity falls back to heuristic scoring.
`perplexity_cfg`	`PerplexityConfig \\| None`	`None`	LLM connection config (model, API key, host URL). Required when `llm_client` is set.

PerplexityConfig

from clm_core import PerplexityConfig

cfg = PerplexityConfig(
    llm_model="claude-haiku-4-5-20251001",  # Model to use for evaluation
    api_key="sk-ant-...",                    # API key for the chosen provider
    host_url="https://api.anthropic.com",   # Base URL for the API
    temperature=0.0,                        # Sampling temperature (default: 0.0)
)

Field	Type	Default	Description
`llm_model`	`str`	—	Model identifier for the LLM call
`api_key`	`str`	—	API key for the chosen provider
`host_url`	`str`	—	Base URL for the API endpoint
`temperature`	`float`	`0.0`	Sampling temperature

`analyze()`

report = gate.analyze(
    original: str,
    compressed: str,
    structured: dict | None = None,
    run_perplexity: bool = False,
    verbose: bool = False,
    perplexity_task: str | None = None,
) -> CompressionQualityReport

Parameters:

Parameter	Type	Default	Description
`original`	`str`	—	Raw transcript or source text
`compressed`	`str`	—	CLM encoder output token string
`structured`	`dict \\| None`	`None`	Thread Encoder structured extraction dict. If `None`, conditional entropy is skipped (assumed perfect).
`run_perplexity`	`bool`	`False`	Set `True` to run LLM perplexity analysis. Set `False` to skip API calls — useful for batch validation or CI/CD pipelines.
`verbose`	`bool`	`False`	Print progress to stdout as each stage runs.
`perplexity_task`	`str \\| None`	`None`	Custom task prompt sent to the LLM for both original and compressed inputs. If `None`, uses the built-in structured JSON extraction task.

CompressionQualityReport

The unified report combining results from all three analyzers.

class CompressionQualityReport(BaseModel):
    original: str
    compressed: str
    kolmogorov: KolmogorovModel
    conditional: ConditionalEntropyResult | None
    perplexity: PerplexityResult
    verdict: Literal["lossless", "acceptable", "high_risk"]
    retention_score: float  # 0–100

Fields

`verdict`

Three possible outcomes:

Verdict	Condition	Meaning
`lossless`	All three analyzers passed	Compression is semantically safe
`acceptable`	Kolmogorov + Conditional passed, perplexity borderline	Usable with monitoring
`high_risk`	Conditional entropy failed, or two or more failed	Compression likely dropped meaning

Conditional entropy has veto power. A borderline perplexity score alone will not push a result to high_risk, but a failed conditional entropy check always does — because it is the most direct measure of semantic loss.

`retention_score`

Weighted composite score from 0 to 100:

retention_score = (kolmogorov × 0.25) + (conditional × 0.50) + (perplexity × 0.25)

Component	Weight	Rationale
Kolmogorov information efficiency	25%	Structural signal, low variance
Conditional entropy weighted coverage	50%	Direct semantic slot comparison
Perplexity comprehension score	25%	LLM understanding, API-dependent

`summary()`

Returns a human-readable multi-line breakdown of all metrics:

Verdict:              LOSSLESS
Retention Score:      96.8%

[Kolmogorov]
  Complexity ratio:   0.412
  Info efficiency:    1.18x
  Passed:             True

[Conditional Entropy]
  Slots (total/matched): 14/14
  Null in source:     4 fields skipped
  Weighted coverage:  100.0%
  Raw coverage:       100.0%
  Residual entropy:   0.00 bits
  Lost fields:        none
  Passed:             True

[Perplexity]
  Comprehension:      0.87
  Latency saved:      12.3%
  Response similarity:0.85
  Passed:             True

Complete Example

from clm_core import CompressionQualityGate, PerplexityConfig

ORIGINAL = """
Customer called in to report a duplicate charge on their account.
Agent verified the billing issue, confirmed a duplicate payment had been
processed, and initiated a refund of $49.99 with reference RFD-908712.
A confirmation email was sent. Customer sentiment moved from neutral to
satisfied to grateful over the 7-minute voice call.
"""

STRUCTURED = {
    'channel': 'VOICE',
    'lang': 'EN',
    'domain': 'BILLING',
    'service': 'PAYMENT',
    'customerIntent': 'REPORT_BILLING_ISSUE',
    'state': 'PENDING_REFUND',
    'resolution': 'REFUND_INITIATED',
    'sentiment': ['NEUTRAL', 'SATISFIED', 'GRATEFUL'],
    'agentActions': ['ACCOUNT_LOOKUP', 'DUPLICATE_PAYMENT_CONFIRMED', 'CUSTOMER_NOTIFIED'],
    'commitments': [{'type': 'REFUND', 'etaDays': 4}],
    'artifacts': [{'key': 'REFUND_REF', 'value': 'RFD-908712'}],
    'durationSeconds': 420,
}

COMPRESSED = (
    "[INTERACTION:SUPPORT:CHANNEL=VOICE] [DURATION=7m] [LANG=EN] "
    "[DOMAIN:BILLING] [SERVICE:PAYMENT] "
    "[CUSTOMER_INTENT:REPORT_BILLING_ISSUE] "
    "[AGENT_ACTIONS:ACCOUNT_LOOKUP→DUPLICATE_PAYMENT_CONFIRMED→CUSTOMER_NOTIFIED] "
    "[RESOLUTION:REFUND_INITIATED] [STATE:PENDING_REFUND] "
    "[COMMITMENT:REFUND:ETA=4d] [ARTIFACTS:REFUND_REF=RFD-908712] "
    "[SENTIMENT:NEUTRAL→SATISFIED→GRATEFUL]"
)

# Offline validation (Kolmogorov + Conditional Entropy)
gate = CompressionQualityGate()
report = gate.analyze(
    original=ORIGINAL,
    compressed=COMPRESSED,
    structured=STRUCTURED,
    verbose=True,
)

print(report.summary())

# Per-field breakdown
print("\n[Field-by-field Conditional Entropy]")
for fr in report.conditional.field_results:
    status = "NULL" if fr.null_in_source else ("✓" if fr.found_in_compressed else "✗ LOST")
    print(f"{fr.field:<22} {fr.token_key:<22} {status:<8} weight={fr.weight}")

With LLM perplexity enabled:

cfg = PerplexityConfig(
    llm_model="claude-haiku-4-5-20251001",
    api_key="sk-ant-...",
    host_url="https://api.anthropic.com",
)
gate = CompressionQualityGate(llm_client="anthropic", perplexity_cfg=cfg)
report = gate.analyze(
    original=ORIGINAL,
    compressed=COMPRESSED,
    structured=STRUCTURED,
    run_perplexity=True,
    verbose=True,
)
print(report.summary())

Architecture

The three analyzers run sequentially and independently — a failure in one does not prevent the others from running. The gate always returns a complete report.

gate.analyze(original, compressed, structured)
    │
    ├─ KolmogorovAnalyzer.analyze(original, compressed)
    │       └─ KolmogorovModel
    │
    ├─ ConditionalEntropyAnalyzer.analyze(structured, compressed)
    │       └─ ConditionalEntropyResult
    │
    └─ PerplexityAnalyzer.analyze(original, compressed)
            └─ PerplexityResult
                    │
                    ├─ via API (anthropic or openai) if llm_client set + run_perplexity=True
                    └─ via heuristic token overlap if not

When structured is None, the conditional entropy result is a synthetic perfect score (coverage=1.0, entropy=0.0, passed=True), so it does not influence the verdict.

When run_perplexity=False, the perplexity result is similarly a synthetic perfect score, enabling fast offline validation.

Interpreting Results

Lossless compression — expected for well-formed CLM output

All three analyzers pass. The token string is structurally simpler (as expected and healthy), retains all semantic slots, and an LLM produces equivalent answers from both inputs.

Acceptable — usable with monitoring

Kolmogorov and conditional entropy passed. Perplexity fell short of its threshold, which can happen due to LLM response variability or domain-specific token phrasing. Monitor in production.

High risk — review the encoder output

At minimum, a critical semantic field was dropped (customerIntent, resolution, domain, or state) or weighted coverage fell below 88%. Do not use this compressed output without investigation.

Next Steps

Kolmogorov Complexity — Structural information equivalence via compression ratio
Conditional Entropy — Semantic slot comparison and field schema
Perplexity — LLM comprehension test and heuristic fallback