CLM

Compressed Language Models via Semantic Token Encoding

Semantic compression for transcripts, structured data, and system prompts - achieving 60-95% token reduction.

Overview

CLM is a patent-pending compression technology that reduces LLM token consumption through semantic encoding. Unlike simple abbreviation or character-level compression, CLM preserves the meaning of your content using structured token vocabularies.

Three Core Compression Targets

Thread Encoder - Conversation threads in any format: labeled transcripts (customer/agent turns), emails, Slack threads, SMS, and raw notes
Structured Data Encoder (SDE) - Product catalogs, knowledge bases, business rules, configurations
System Prompts Encoder - Task instructions, role definitions, operational guidelines

Key Benefits: - 60-95% token reduction - Equal or better LLM responses - Up to 73% faster processing - No model training required

Installation

Install CLM using pip:

pip install clm-core

Install spaCy Language Model

CLM uses spaCy for natural language processing tasks such as NER and rule-based information extraction. We support four languages: English, Portuguese, French, and Spanish.

Download your language model from spaCy:

# English
python -m spacy download en_core_web_sm

# Portuguese
python -m spacy download pt_core_news_sm

# Spanish
python -m spacy download es_core_news_sm

# French
python -m spacy download fr_core_news_sm

Quick Start

Import CLM

from clm_core import CLMConfig, CLMEncoder

Create Configuration

config = CLMConfig(lang="en")

By choosing English as your language, CLM automatically loads the en_core_web_sm spaCy model and uses the internal English dictionary and vocabulary for compression.

Compression Examples

1. System Prompt Compression

Perfect for compressing task instructions, role definitions, and specifications:

encoder = CLMEncoder(cfg=config)

sys_prompt = """
You are a Call QA & Compliance Scoring System for customer service operations.

TASK:
Analyze the thread_encoder and score the agent's compliance across required QA categories.

ANALYSIS CRITERIA:
- Mandatory disclosures and verification steps
- Policy adherence
- Soft-skill behaviors (empathy, clarity, ownership)
- Process accuracy
- Compliance violations or risks
- Customer sentiment trajectory

OUTPUT FORMAT:
{
    "summary": "short_summary",
    "qa_scores": {
        "verification": 0.0,
        "policy_adherence": 0.0,
        "soft_skills": 0.0,
        "accuracy": 0.0,
        "compliance": 0.0
    },
    "violations": ["list_any_detected"],
    "recommendations": ["improvement_suggestions"]
}

SCORING:
0.00–0.49: Fail
0.50–0.74: Needs Improvement
0.75–0.89: Good
0.90–1.00: Excellent
"""

result = encoder.encode(sys_prompt)
print(result.compressed)

Output:

[REQ:ANALYZE] [TARGET:TRANSCRIPT:DOMAIN=QA] 
[EXTRACT:COMPLIANCE,DISCLOSURES,VERIFICATION,POLICY,SOFT_SKILLS,ACCURACY,SENTIMENT:TYPE=LIST,DOMAIN=LEGAL] 
[OUT_JSON:{summary,qa_scores:{verification,policy_adherence,soft_skills,accuracy,compliance},violations,recommendations}:ENUMS={"ranges": [{"min": 0.0, "max": 0.49, "label": "FAIL"}, {"min": 0.5, "max": 0.74, "label": "NEEDS_IMPROVEMENT"}, {"min": 0.75, "max": 0.89, "label": "GOOD"}, {"min": 0.9, "max": 1.0, "label": "EXCELLENT"}]}]

📊 Compression: 62.7% token reduction

The compressed result preserves the semantic meaning while dramatically reducing token count through Hierarchical Token representation.

2. Structured Data Compression

Compress knowledge bases, product catalogs, and structured datasets:

from clm_core import SDCompressionConfig

kb_catalog = [
    {
        "article_id": "KB-001",
        "title": "How to Reset Password",
        "content": "To reset your password, go to the login page and click...",
        "category": "Account",
        "tags": ["password", "security", "account"],
        "views": 1523,
        "last_updated": "2024-10-15",
    }
]

config = CLMConfig(
    lang="en",
    ds_config=SDCompressionConfig(
        auto_detect=True,
        required_fields=["article_id", "title"],
        field_importance={"tags": 0.8, "content": 0.9},
        max_truncation_length=100,
    )
)

encoder = CLMEncoder(cfg=config)
result = encoder.encode(kb_catalog)
print(result.compressed)

Output:

{article_id,title,content,category,tags,views}[KB-001,How to Reset Password,To reset your password; go to the login page and click...,Account,password+security+account,1523]

Note: Commas in text values are escaped with semicolons. Arrays use + as separator.

Key Features: - Supports both single objects and arrays of objects - Configure field importance and thresholds - Specify required/excluded fields - Auto-detect important fields based on patterns - Nested structures preserved with inline formatting - Per-field truncation via max_truncation_mapping, applied recursively to nested objects

Learn more about Structured Data Compression.

3. Thread Compression

Compress customer service conversations and unstructured threads while preserving context and sentiment using CLM Transcript Schema v2. Thread Encoder handles both labeled transcripts (Agent: / Customer: turns) and free-form text such as emails, Slack threads, or raw case notes — the format is detected automatically:

# Billing Issue - Customer Support Transcript
transcript = """Customer: Hi Raj, I noticed an extra charge on my card for my plan this month. It looks like I was billed twice for the same subscription.
Agent: I'm sorry to hear that, let's take a look together. Can I have your account email or billing ID to verify your record?
Customer: Sure, it's melissa.jordan@example.com.
Agent: Thanks, Melissa. Give me just a moment... alright, I can see two transactions on your file — one processed on the 2nd and another on the 3rd. It seems the system retried payment even after the first one succeeded.
Customer: Oh wow, that explains it. So I'm not crazy then.
Agent: Not at all. It's a known issue we had earlier this week with duplicate processing. The good news is, you're eligible for a full refund on the second charge.
Customer: Great. How long will it take to show up?
Agent: Once I file the refund, it usually reflects within 3–5 business days depending on your bank. I'll also send you a confirmation email with the reference number.
Customer: That works. Thank you for sorting it out so quickly.
Agent: My pleasure. I've just submitted the refund request now — your reference number is RFD-908712. You should see that update later today.
Customer: Perfect. I appreciate your help, Raj.
Agent: Anytime! Is there anything else I can check for you today?
Customer: No, that's all. Thanks again!
Agent: Thank you for calling us, Melissa. Have a great day ahead!"""

cfg = CLMConfig(lang="en")
encoder = CLMEncoder(cfg=cfg)

result = encoder.encode(
    input_=transcript,
    metadata={
        'call_id': 'CALL-0001',
        'representative': 'Raj',
        'duration': '9m',
        'channel': 'voice',
        'issue_type': 'Billing Dispute'
    }
)
print(result.compressed)

Output:

[INTERACTION:SUPPORT:CHANNEL=VOICE]
[DURATION=6m]
[LANG=EN]
[DOMAIN:BILLING]
[SERVICE:SUBSCRIPTION]
[CUSTOMER_INTENT:REPORT_DUPLICATE_CHARGE]
[CONTEXT:EMAIL_PROVIDED]
[AGENT_ACTIONS:ACCOUNT_VERIFIED→DIAGNOSTIC_PERFORMED→REFUND_INITIATED]
[SYSTEM_ACTIONS:PAYMENT_RETRY_DETECTED]
[RESOLUTION:ISSUE_RESOLVED]
[STATE:RESOLVED]
[COMMITMENT:REFUND_3-5_DAYS]
[ARTIFACT:REFUND_REF=RFD-908712]
[SENTIMENT:NEUTRAL→GRATEFUL]

What's Preserved: - ✅ Interaction metadata (channel, duration, language) - ✅ Domain and service context (BILLING, SUBSCRIPTION) - ✅ Customer intent derived from customer utterances - ✅ Context provided without PII leakage (EMAIL_PROVIDED, not the actual email) - ✅ Agent actions as an ordered chain - ✅ System-detected events (PAYMENT_RETRY_DETECTED) - ✅ Resolution outcome and authoritative state - ✅ Agent commitments (refund timeline) - ✅ Structured artifacts (refund reference) - ✅ Sentiment trajectory throughout conversation

Typical Compression: 85-92% for customer service transcripts

Thread Encoder behaviour is configurable via ThreadConfig — enable include_ctx_values to surface extracted entity values, include_summary to generate a human-readable summary from the compressed output, estimate_thread_duration to infer duration from content, and redaction_pattern to control PII placeholder detection:

from clm_core.types import ThreadConfig

cfg = CLMConfig(
    lang="en",
    thread_config=ThreadConfig(
        include_ctx_values=True,
        include_summary=True,
    )
)

Learn more about the Thread Encoder, Transcript Compression, Free-Form Compression, and token hierarchy.

Hierarchical Token Vocabulary

CLM uses six semantic token categories:

Token	Purpose	Example
`REQ`	Actions/operations	`[REQ:ANALYZE]`, `[REQ:EXTRACT]`
`TARGET`	Objects/data sources	`[TARGET:TRANSCRIPT]`, `[TARGET:DOCUMENT]`
`EXTRACT`	Fields to extract	`[EXTRACT:SENTIMENT,INTENT]`
`CTX`	Contextual information	`[CTX:CUSTOMER_SERVICE]`
`OUT`	Output formats	`[OUT:JSON]`, `[OUT:TABLE]`
`REF`	References/IDs	`[REF:CASE=12345]`

This structured approach preserves semantic relationships while achieving massive token reduction.

Performance Metrics

Based on production testing with 5,000+ samples:

Target Type	Average Compression	Use Case
System Prompts	75-90%	Task instructions, role definitions
Transcripts	85-92%	Customer service calls
Structured Data	40-85%	Catalogs, configurations

Validation Accuracy: 91.5%
Test Pass Rate: 88.2%
Processing Speed Improvement: Up to 73%
Multilingual Coverage: 4 languages

Next Steps

System Prompt Encoding - Overview of system prompt compression
Task Prompts - Action-oriented instruction compression
Configuration Prompts - Template-based agent configuration
Structured Data Encoding - Configuration options and best practices
Thread Encoder - Conversation-based compression (calls, chats, emails, free-form threads)
Transcript Encoding - Customer service transcript compression
Free-Form Encoding - Emails, Slack threads, and unstructured prose
Advanced: CLM Dictionary - Understanding the vocabulary
Advanced: Tokenization - Token hierarchy and structure
Advanced: Quality Gate - Validating that compression preserves meaning

License

CLM is dual-licensed:

AGPL-3.0 for open source projects (details)
Commercial License for proprietary use (contact us)

See our Licensing Guide for details.

Support

📖 Documentation: docs.cllm.io
💬 Discussions: GitHub Discussions
🐛 Issues: GitHub Issues
📧 Email: yanick.jair.ta@gmail.com

Made with ❤️ for the LLM community