A deep dive into the technical architecture and algorithms that power the Continuity Engine's intelligent context preservation.
The Continuity Engine operates as a middleware layer between your chat interface and the AI model. It intercepts messages, maintains context state, and ensures the AI always has access to the most relevant information within token limits.
When preparing a request for the AI model, the Context Builder assembles a Context Pack - a structured bundle containing all relevant context within token limits:
interface ContextPack {
// User-defined persistent memory
userMemory: UserMemoryItem[];
// Pinned messages (never compressed)
pinnedMemory: PinnedMemoryItem[];
// AI-extracted summary of older messages
rollingSummary: RollingSummary;
// Most recent messages (full detail)
recentMessages: Message[];
// Context health metadata
metadata: ContextMetadata;
}The rolling summary is the heart of context preservation. It's a structured document that captures the essence of your conversation:
| Field | Type | Description |
|---|---|---|
| goals | string[] | What the user is trying to achieve |
| decisions | Decision[] | Choices made with reasoning |
| requirements | string[] | Constraints and specifications |
| currentPlan | string[] | Current approach or strategy |
| openQuestions | string[] | Unresolved questions or blockers |
| definitions | Record<string, string> | Domain-specific terms explained |
| importantReferences | Reference[] | Files, URLs, IDs, code snippets |
The engine uses an approximate token counting algorithm optimized for speed. Rather than calling the actual tokenizer (which adds latency), it uses a character-based heuristic:
// Approximate token count (~4 characters per token)
function countTokens(text: string): number {
return Math.ceil(text.length / 4);
}
// For messages, include role overhead
function countMessageTokens(message: Message): number {
const roleTokens = 4; // ~4 tokens for role prefix
const contentTokens = countTokens(message.content);
return roleTokens + contentTokens;
}Accuracy vs Speed
This approximation is typically within 10-15% of actual token counts. The slight inaccuracy is acceptable because compaction thresholds have built-in buffers, and speed is critical for real-time chat.
After each assistant response, the engine checks if compaction is needed:
Calculate total token count of recent messages. If below the light threshold (8,000 tokens), no action needed.
Based on token count, select the appropriate level:
light: 8,000+ tokens - gentle summarizationdeep: 16,000+ tokens - more aggressiveaggressive: 24,000+ tokens - maximum compressionThe compactor sends older messages to the AI with a specialized prompt that extracts key information into the rolling summary structure.
Messages that have been summarized are marked as compacted. Only the most recent messages (default: 20) are kept in full detail.
Preservation ratio and token counts are recalculated and stored.
The preservation ratio indicates how much of the original context has been retained. It's calculated as:
preservationRatio = (compressedTokens / totalTokens) * 100
// Example:
// Total tokens ever seen: 50,000
// Tokens in current context: 35,000
// Ratio: (35000 / 50000) * 100 = 70%A higher ratio means more original content is preserved. The engine aims to maintain at least 60% preservation while staying within token limits.
Pinned items receive special treatment:
When building the context pack for an AI request, elements are assembled in this specific order:
1. System Prompt (if applicable)
2. User Memory (persistent facts/preferences)
3. Pinned Memory (critical items)
4. Rolling Summary (compressed history)
5. Recent Messages (last 20, full detail)
6. Current User MessageToken Budget
The context builder respects token budgets. If adding the full rolling summary would exceed limits, it progressively trims older sections while preserving the most recent and pinned content.
The Continuity Engine stores its data in PostgreSQL using these tables:
-- Conversation context (rolling summary, metadata)
conversation_context
- id, conversation_id, rolling_summary (jsonb)
- total_token_count, compressed_token_count
- last_compaction_at, compaction_count
-- Pinned memory items
pinned_memory
- id, conversation_id, message_id
- content, category, pinned_at
-- User-level memory (cross-conversation)
user_memory
- id, user_id, workspace_id
- type (preference/fact/constraint/definition)
- key, content, is_active