Continuity Engine

How It Works

A deep dive into the technical architecture and algorithms that power the Continuity Engine's intelligent context preservation.

Architecture Overview

The Continuity Engine operates as a middleware layer between your chat interface and the AI model. It intercepts messages, maintains context state, and ensures the AI always has access to the most relevant information within token limits.

User Message

New input from user

Context Builder

Assembles context pack from tiers

AI Model

Processes with full context awareness

Compactor

Updates summary if threshold reached

The Context Pack

When preparing a request for the AI model, the Context Builder assembles a Context Pack - a structured bundle containing all relevant context within token limits:

ContextPack Structuretypescript

interface ContextPack {
  // User-defined persistent memory
  userMemory: UserMemoryItem[];

  // Pinned messages (never compressed)
  pinnedMemory: PinnedMemoryItem[];

  // AI-extracted summary of older messages
  rollingSummary: RollingSummary;

  // Most recent messages (full detail)
  recentMessages: Message[];

  // Context health metadata
  metadata: ContextMetadata;
}

Rolling Summary Structure

The rolling summary is the heart of context preservation. It's a structured document that captures the essence of your conversation:

Field	Type	Description
goals	string[]	What the user is trying to achieve
decisions	Decision[]	Choices made with reasoning
requirements	string[]	Constraints and specifications
currentPlan	string[]	Current approach or strategy
openQuestions	string[]	Unresolved questions or blockers
definitions	Record<string, string>	Domain-specific terms explained
importantReferences	Reference[]	Files, URLs, IDs, code snippets

Token Counting

The engine uses an approximate token counting algorithm optimized for speed. Rather than calling the actual tokenizer (which adds latency), it uses a character-based heuristic:

Token Estimationtypescript

// Approximate token count (~4 characters per token)
function countTokens(text: string): number {
  return Math.ceil(text.length / 4);
}

// For messages, include role overhead
function countMessageTokens(message: Message): number {
  const roleTokens = 4; // ~4 tokens for role prefix
  const contentTokens = countTokens(message.content);
  return roleTokens + contentTokens;
}

Accuracy vs Speed

This approximation is typically within 10-15% of actual token counts. The slight inaccuracy is acceptable because compaction thresholds have built-in buffers, and speed is critical for real-time chat.

Compaction Flow

After each assistant response, the engine checks if compaction is needed:

Token Count Check

Calculate total token count of recent messages. If below the light threshold (8,000 tokens), no action needed.

Determine Compaction Level

Based on token count, select the appropriate level:

light: 8,000+ tokens - gentle summarization
deep: 16,000+ tokens - more aggressive
aggressive: 24,000+ tokens - maximum compression

Generate Updated Summary

The compactor sends older messages to the AI with a specialized prompt that extracts key information into the rolling summary structure.

Prune Messages

Messages that have been summarized are marked as compacted. Only the most recent messages (default: 20) are kept in full detail.

Update Metadata

Preservation ratio and token counts are recalculated and stored.

Preservation Ratio Calculation

The preservation ratio indicates how much of the original context has been retained. It's calculated as:

Preservation Ratiotypescript

preservationRatio = (compressedTokens / totalTokens) * 100

// Example:
// Total tokens ever seen: 50,000
// Tokens in current context: 35,000
// Ratio: (35000 / 50000) * 100 = 70%

A higher ratio means more original content is preserved. The engine aims to maintain at least 60% preservation while staying within token limits.

Pinned Memory Handling

Pinned items receive special treatment:

Never compressed - Pinned content is always included verbatim in the context pack

Priority placement - Pinned items appear before the rolling summary in the context

Categorization - Pins can be categorized as decision, requirement, reference, or other

Context Building Order

When building the context pack for an AI request, elements are assembled in this specific order:

Context Assembly Ordertext

1. System Prompt (if applicable)
2. User Memory (persistent facts/preferences)
3. Pinned Memory (critical items)
4. Rolling Summary (compressed history)
5. Recent Messages (last 20, full detail)
6. Current User Message

Token Budget

The context builder respects token budgets. If adding the full rolling summary would exceed limits, it progressively trims older sections while preserving the most recent and pinned content.

Database Schema

The Continuity Engine stores its data in PostgreSQL using these tables:

Key Tablessql

-- Conversation context (rolling summary, metadata)
conversation_context
  - id, conversation_id, rolling_summary (jsonb)
  - total_token_count, compressed_token_count
  - last_compaction_at, compaction_count

-- Pinned memory items
pinned_memory
  - id, conversation_id, message_id
  - content, category, pinned_at

-- User-level memory (cross-conversation)
user_memory
  - id, user_id, workspace_id
  - type (preference/fact/constraint/definition)
  - key, content, is_active

Overview Pinning & Memory