Continuity Engine

Auto-Compaction

Understand how the Continuity Engine automatically compresses conversations while preserving the context that matters most.

What is Compaction?

Compaction is the process of intelligently summarizing older messages to reduce token usage while preserving essential context. Think of it as creating a "cliff notes" version of your conversation history.

Without compaction, long conversations would either:

Exceed token limits, causing errors
Require dropping older messages, losing context
Become prohibitively expensive due to large context windows

Compaction solves this by extracting the meaning from older messages into a structured summary, then pruning the raw messages.

Compaction Levels

The engine supports three compaction levels, each with different preservation targets:

Level	Trigger	Target Ratio	Behavior
light	8,000+ tokens	~85%	Gentle summarization of oldest messages
deep	16,000+ tokens	~70%	More aggressive, consolidates summary
aggressive	24,000+ tokens	~60%	Maximum compression, essential info only

Automatic vs Manual Compaction

Automatic Compaction

By default, compaction runs automatically after each assistant response when token thresholds are exceeded. This happens in the background without interrupting your conversation.

Auto-Compaction Flowtext

User sends message
    ↓
AI generates response
    ↓
Response saved to database
    ↓
Token count checked:
  - < 8,000: No action
  - 8,000-15,999: Light compaction
  - 16,000-23,999: Deep compaction
  - 24,000+: Aggressive compaction
    ↓
Summary updated (if compaction triggered)
    ↓
Metadata refreshed

Background Processing

Compaction runs asynchronously after the response is delivered. You won't notice any delay - the AI responds immediately and compaction happens behind the scenes.

Manual Compaction

You can also trigger compaction manually from the Summary Drawer:

Open Summary Drawer

Click the context indicator in the chat header to open the drawer.

Click Compact Button

At the bottom of the drawer, click the "Compact Context" button. A loading indicator shows while processing.

Review Updated Summary

Once complete, the summary sections will update with newly extracted information. The preservation ratio will also update.

Manual compaction always uses the light level by default. This is usually sufficient for most use cases.

Configuration Thresholds

The Continuity Engine uses these configurable thresholds:

constants.tstypescript

export const CONTINUITY_CONFIG = {
  // Token thresholds for triggering compaction
  LIGHT_COMPACTION_THRESHOLD: 8000,
  DEEP_COMPACTION_THRESHOLD: 16000,
  AGGRESSIVE_COMPACTION_THRESHOLD: 24000,

  // Recent message window
  RECENT_MESSAGE_COUNT: 20,
  MIN_MESSAGES_FOR_COMPACTION: 10,

  // Summary limits
  MAX_GOALS: 5,
  MAX_DECISIONS: 10,
  MAX_REQUIREMENTS: 10,
  MAX_OPEN_QUESTIONS: 5,
  MAX_REFERENCES: 20,

  // Preservation targets
  TARGET_PRESERVATION_RATIO: 85, // percent
  MIN_PRESERVATION_RATIO: 60,

  // Timing
  COMPACTION_DEBOUNCE_MS: 5000,
  MAX_COMPACTION_TIME_MS: 30000,
};

What Gets Extracted

During compaction, the AI analyzes messages and extracts:

🎯

Goals

What the user is trying to achieve. "Build a REST API", "Debug the authentication flow", "Optimize database queries".

✓

Decisions

Choices made during the conversation with their reasoning. "Chose PostgreSQL over MongoDB because we need strong consistency."

📋

Requirements

Specifications and constraints. "Must support 10,000 concurrent users", "Response time under 200ms".

❓

Open Questions

Unresolved issues or pending decisions. "Need to decide on caching strategy", "Waiting for API credentials".

📚

Definitions

Domain-specific terms defined in the conversation. "A 'workspace' is a shared collaboration space with its own billing."

🔗

References

Important files, URLs, IDs, or code snippets. Categorized by type (file, url, id, code, other).

Feature Flags

The Continuity Engine can be controlled via environment variables:

.envbash

# Enable/disable the entire Continuity Engine
CONTINUITY_ENGINE_ENABLED=true

# Enable/disable automatic compaction
CONTINUITY_ENGINE_AUTO_COMPACT=true

# Show/hide Continuity Engine UI elements
CONTINUITY_ENGINE_UI=true

All flags default to true if not specified. Set to false to disable.

Performance Considerations

Async Processing

Compaction never blocks the user. It runs after the response is delivered.

Debouncing

Rapid messages don't trigger multiple compactions. A 5-second debounce prevents unnecessary processing.

Timeout Protection

Compaction has a 30-second timeout. If the AI takes too long, the operation is cancelled gracefully.

API Usage

Each compaction makes an API call to generate the summary. This uses tokens but typically far fewer than sending the full history each time.

Troubleshooting

Compaction not triggering?

Check that CONTINUITY_ENGINE_AUTO_COMPACT is not set to false
Verify the conversation has at least 10 messages
Confirm token count exceeds 8,000 (check Summary Drawer)

Summary seems incomplete?

Try manual compaction to trigger a fresh extraction
Important info may be in excluded messages - check exclusions
Pin critical information to ensure it's preserved

Preservation ratio too low?

Pin the most important messages
Start a new conversation for a different topic
Use user memory for persistent preferences

Pinning & Memory API Reference