Understand how the Continuity Engine automatically compresses conversations while preserving the context that matters most.
Compaction is the process of intelligently summarizing older messages to reduce token usage while preserving essential context. Think of it as creating a "cliff notes" version of your conversation history.
Without compaction, long conversations would either:
Compaction solves this by extracting the meaning from older messages into a structured summary, then pruning the raw messages.
The engine supports three compaction levels, each with different preservation targets:
| Level | Trigger | Target Ratio | Behavior |
|---|---|---|---|
| light | 8,000+ tokens | ~85% | Gentle summarization of oldest messages |
| deep | 16,000+ tokens | ~70% | More aggressive, consolidates summary |
| aggressive | 24,000+ tokens | ~60% | Maximum compression, essential info only |
By default, compaction runs automatically after each assistant response when token thresholds are exceeded. This happens in the background without interrupting your conversation.
User sends message
↓
AI generates response
↓
Response saved to database
↓
Token count checked:
- < 8,000: No action
- 8,000-15,999: Light compaction
- 16,000-23,999: Deep compaction
- 24,000+: Aggressive compaction
↓
Summary updated (if compaction triggered)
↓
Metadata refreshedBackground Processing
Compaction runs asynchronously after the response is delivered. You won't notice any delay - the AI responds immediately and compaction happens behind the scenes.
You can also trigger compaction manually from the Summary Drawer:
Click the context indicator in the chat header to open the drawer.
At the bottom of the drawer, click the "Compact Context" button. A loading indicator shows while processing.
Once complete, the summary sections will update with newly extracted information. The preservation ratio will also update.
Manual compaction always uses the light level by default. This is usually sufficient for most use cases.
The Continuity Engine uses these configurable thresholds:
export const CONTINUITY_CONFIG = {
// Token thresholds for triggering compaction
LIGHT_COMPACTION_THRESHOLD: 8000,
DEEP_COMPACTION_THRESHOLD: 16000,
AGGRESSIVE_COMPACTION_THRESHOLD: 24000,
// Recent message window
RECENT_MESSAGE_COUNT: 20,
MIN_MESSAGES_FOR_COMPACTION: 10,
// Summary limits
MAX_GOALS: 5,
MAX_DECISIONS: 10,
MAX_REQUIREMENTS: 10,
MAX_OPEN_QUESTIONS: 5,
MAX_REFERENCES: 20,
// Preservation targets
TARGET_PRESERVATION_RATIO: 85, // percent
MIN_PRESERVATION_RATIO: 60,
// Timing
COMPACTION_DEBOUNCE_MS: 5000,
MAX_COMPACTION_TIME_MS: 30000,
};During compaction, the AI analyzes messages and extracts:
What the user is trying to achieve. "Build a REST API", "Debug the authentication flow", "Optimize database queries".
Choices made during the conversation with their reasoning. "Chose PostgreSQL over MongoDB because we need strong consistency."
Specifications and constraints. "Must support 10,000 concurrent users", "Response time under 200ms".
Unresolved issues or pending decisions. "Need to decide on caching strategy", "Waiting for API credentials".
Domain-specific terms defined in the conversation. "A 'workspace' is a shared collaboration space with its own billing."
Important files, URLs, IDs, or code snippets. Categorized by type (file, url, id, code, other).
The Continuity Engine can be controlled via environment variables:
# Enable/disable the entire Continuity Engine
CONTINUITY_ENGINE_ENABLED=true
# Enable/disable automatic compaction
CONTINUITY_ENGINE_AUTO_COMPACT=true
# Show/hide Continuity Engine UI elements
CONTINUITY_ENGINE_UI=trueAll flags default to true if not specified. Set to false to disable.
Compaction never blocks the user. It runs after the response is delivered.
Rapid messages don't trigger multiple compactions. A 5-second debounce prevents unnecessary processing.
Compaction has a 30-second timeout. If the AI takes too long, the operation is cancelled gracefully.
Each compaction makes an API call to generate the summary. This uses tokens but typically far fewer than sending the full history each time.
Compaction not triggering?
CONTINUITY_ENGINE_AUTO_COMPACT is not set to falseSummary seems incomplete?
Preservation ratio too low?