Documentation
Continuity Engine

How It Works

A deep dive into the technical architecture and algorithms that power the Continuity Engine's intelligent context preservation.

Architecture Overview

The Continuity Engine operates as a middleware layer between your chat interface and the AI model. It intercepts messages, maintains context state, and ensures the AI always has access to the most relevant information within token limits.

User Message
New input from user
Context Builder
Assembles context pack from tiers
AI Model
Processes with full context awareness
Compactor
Updates summary if threshold reached

The Context Pack

When preparing a request for the AI model, the Context Builder assembles a Context Pack - a structured bundle containing all relevant context within token limits:

ContextPack Structuretypescript
interface ContextPack {
  // User-defined persistent memory
  userMemory: UserMemoryItem[];

  // Pinned messages (never compressed)
  pinnedMemory: PinnedMemoryItem[];

  // AI-extracted summary of older messages
  rollingSummary: RollingSummary;

  // Most recent messages (full detail)
  recentMessages: Message[];

  // Context health metadata
  metadata: ContextMetadata;
}

Rolling Summary Structure

The rolling summary is the heart of context preservation. It's a structured document that captures the essence of your conversation:

FieldTypeDescription
goalsstring[]What the user is trying to achieve
decisionsDecision[]Choices made with reasoning
requirementsstring[]Constraints and specifications
currentPlanstring[]Current approach or strategy
openQuestionsstring[]Unresolved questions or blockers
definitionsRecord<string, string>Domain-specific terms explained
importantReferencesReference[]Files, URLs, IDs, code snippets

Token Counting

The engine uses an approximate token counting algorithm optimized for speed. Rather than calling the actual tokenizer (which adds latency), it uses a character-based heuristic:

Token Estimationtypescript
// Approximate token count (~4 characters per token)
function countTokens(text: string): number {
  return Math.ceil(text.length / 4);
}

// For messages, include role overhead
function countMessageTokens(message: Message): number {
  const roleTokens = 4; // ~4 tokens for role prefix
  const contentTokens = countTokens(message.content);
  return roleTokens + contentTokens;
}

Accuracy vs Speed

This approximation is typically within 10-15% of actual token counts. The slight inaccuracy is acceptable because compaction thresholds have built-in buffers, and speed is critical for real-time chat.

Compaction Flow

After each assistant response, the engine checks if compaction is needed:

1

Token Count Check

Calculate total token count of recent messages. If below the light threshold (8,000 tokens), no action needed.

2

Determine Compaction Level

Based on token count, select the appropriate level:

  • light: 8,000+ tokens - gentle summarization
  • deep: 16,000+ tokens - more aggressive
  • aggressive: 24,000+ tokens - maximum compression
3

Generate Updated Summary

The compactor sends older messages to the AI with a specialized prompt that extracts key information into the rolling summary structure.

4

Prune Messages

Messages that have been summarized are marked as compacted. Only the most recent messages (default: 20) are kept in full detail.

5

Update Metadata

Preservation ratio and token counts are recalculated and stored.

Preservation Ratio Calculation

The preservation ratio indicates how much of the original context has been retained. It's calculated as:

Preservation Ratiotypescript
preservationRatio = (compressedTokens / totalTokens) * 100

// Example:
// Total tokens ever seen: 50,000
// Tokens in current context: 35,000
// Ratio: (35000 / 50000) * 100 = 70%

A higher ratio means more original content is preserved. The engine aims to maintain at least 60% preservation while staying within token limits.

Pinned Memory Handling

Pinned items receive special treatment:

1
Never compressed - Pinned content is always included verbatim in the context pack
2
Priority placement - Pinned items appear before the rolling summary in the context
3
Categorization - Pins can be categorized as decision, requirement, reference, or other

Context Building Order

When building the context pack for an AI request, elements are assembled in this specific order:

Context Assembly Ordertext
1. System Prompt (if applicable)
2. User Memory (persistent facts/preferences)
3. Pinned Memory (critical items)
4. Rolling Summary (compressed history)
5. Recent Messages (last 20, full detail)
6. Current User Message

Token Budget

The context builder respects token budgets. If adding the full rolling summary would exceed limits, it progressively trims older sections while preserving the most recent and pinned content.

Database Schema

The Continuity Engine stores its data in PostgreSQL using these tables:

Key Tablessql
-- Conversation context (rolling summary, metadata)
conversation_context
  - id, conversation_id, rolling_summary (jsonb)
  - total_token_count, compressed_token_count
  - last_compaction_at, compaction_count

-- Pinned memory items
pinned_memory
  - id, conversation_id, message_id
  - content, category, pinned_at

-- User-level memory (cross-conversation)
user_memory
  - id, user_id, workspace_id
  - type (preference/fact/constraint/definition)
  - key, content, is_active