Skip to main content

🧠 Mementos System Architecture

Table of Contents​

  1. Overview
  2. Architecture Overview
  3. Data Flow & Request Lifecycle
  4. Components Deep Dive
  5. Memory Management & Grooming
  6. Context Window Protection
  7. Database Schema & Performance
  8. API Endpoints
  9. Production Issues & Fixes
  10. Monitoring & Alerting
  11. Troubleshooting Guide
  12. Best Practices

Overview​

The Mementos System provides intelligent, persistent memory for AI conversations that automatically manages itself to prevent context window overflow and optimize retrieval performance. It implements a three-tier memory hierarchy (HOT/WARM/COLD) with automatic grooming, emergency memory enforcement, and context window protection.

Key Features​

  • πŸ”₯ Three-Tier Memory System: HOT (immediate access), WARM (recent context), COLD (long-term storage)
  • πŸ›‘οΈ Automatic Memory Management: Intelligent grooming prevents memory bloat
  • 🚨 Context Window Protection: Prevents 188K+ token overflow crashes
  • ⚑ Performance Optimized: Database queries optimized for real-time access
  • πŸ“Š Production Monitoring: Comprehensive logging and alerting
  • πŸ”„ Self-Healing: Automatic recovery from memory limit violations

System Stats​

  • Memory Limit: 32,000 characters (configurable)
  • Context Token Limit: 50,000 tokens emergency cap
  • Grooming Triggers: 75% warning, 90% danger, 95% emergency
  • Database: MongoDB with optimized indexes
  • Performance: Sub-100ms context retrieval for 1000+ mementos

Architecture Overview​

graph TB
subgraph "Client Layer"
UI[ProfileModal/MementosTabContent.tsx]
API_CALLS[mementosAPICalls.ts]
HOOKS[useApi Hook]
end

subgraph "API Layer"
CREATE[/api/mementos/create]
BATCH[/api/mementos/create-batch]
LIST[/api/mementos/list]
UPDATE[/api/mementos/update]
DELETE[/api/mementos/delete]
end

subgraph "Core Services"
GROOMING[MementoGroomingService]
EVALUATION[MementoEvaluationService]
CHAT_FEATURES[ChatCompletionFeatures]
end

subgraph "Memory Management"
HOT[(HOT Tier<br/>Immediate Access)]
WARM[(WARM Tier<br/>Recent Context)]
COLD[(COLD Tier<br/>Long-term Storage)]
GROOMER[Grooming Engine]
end

subgraph "Database"
MONGO[(MongoDB<br/>Mementos Collection)]
INDEXES[Optimized Indexes]
end

subgraph "Context Integration"
CHAT[Chat Completion]
CONTEXT[Context Window<br/>Protection]
TOKENS[Token Limiting]
end

UI --> API_CALLS
API_CALLS --> HOOKS
HOOKS --> CREATE
HOOKS --> BATCH
HOOKS --> LIST
HOOKS --> UPDATE
HOOKS --> DELETE

CREATE --> GROOMING
BATCH --> GROOMING
CREATE --> EVALUATION

GROOMING --> HOT
GROOMING --> WARM
GROOMING --> COLD
GROOMING --> GROOMER

CHAT_FEATURES --> CONTEXT
CONTEXT --> TOKENS
CONTEXT --> HOT

HOT --> MONGO
WARM --> MONGO
COLD --> MONGO
MONGO --> INDEXES

Data Flow & Request Lifecycle​

1. Memento Creation Flow​

// Example: User creates a memento
User Input β†’ MementosTabContent.tsx
β†’ createMemento() in mementosAPICalls.ts
β†’ useApi hook (adds auth)
β†’ POST /api/mementos/create
β†’ Memory limit check (95% threshold)
β†’ Emergency grooming (if needed)
β†’ MementoEvaluationService.evaluate()
β†’ Database insertion
β†’ Background grooming trigger
β†’ Response to client
β†’ UI update with toast

2. Chat Completion Integration Flow​

// Example: AI needs context for response
Chat Request β†’ ChatCompletionFeatures.ts
β†’ MementoFeature.getContextMessages()
β†’ findHotMementosByUserId() (HOT tier only)
β†’ Token limit enforcement (50K cap)
β†’ Context message assembly
β†’ LLM API call with safe context
β†’ Response generation
β†’ Automatic memento creation
β†’ Memory limit check before creation
β†’ Grooming if needed

3. Memory Grooming Flow​

// Example: Memory limit exceeded
Memory Check β†’ 90% threshold exceeded
β†’ MementoGroomingService.checkAndScheduleGrooming()
β†’ Synchronous grooming (if forceImmediate=true)
β†’ groomWarmToCold() (15% of WARM β†’ COLD)
β†’ groomHotToWarm() (reduce HOT to 80% target)
β†’ Database batch updates
β†’ Memory recalculation
β†’ Logging and monitoring

Components Deep Dive​

Frontend Components​

MementosTabContent.tsx​

Location: packages/client/app/components/ProfileModal/MementosTabContent.tsx

Responsibilities:

  • Interactive memento management interface
  • Real-time memory usage visualization
  • Inline editing with optimistic UI updates
  • CSV export/import functionality
  • Search and filtering capabilities

Key Features:

  • Joy UI components (not Material UI)
  • React Query for server state
  • Optimistic updates with rollback
  • Memory usage progress bars
  • Tier-based color coding

Performance Optimizations:

  • Virtualized lists for 1000+ mementos
  • Debounced search input
  • Memoized sorting and filtering
  • Lazy loading of large content

mementosAPICalls.ts​

Location: packages/client/app/utils/mementosAPICalls.ts

Purpose: Type-safe API abstraction layer

export interface MementoAPIResponse {
createMemento: (data: CreateMementoRequest) => Promise<IMementoDocument>;
createMementosBatch: (data: CreateMementoRequest[]) => Promise<IMementoDocument[]>;
updateMemento: (id: string, data: UpdateMementoRequest) => Promise<IMementoDocument>;
deleteMemento: (id: string) => Promise<void>;
listMementos: (params: ListMementosParams) => Promise<PaginatedMementos>;
}

Backend Services​

MementoGroomingService​

Location: packages/client/services/MementoGroomingService.ts

Critical Features:

  • Synchronous Emergency Grooming: Prevents memory overflow in real-time
  • Asynchronous Background Grooming: Regular maintenance
  • Memory Threshold Management: 75%, 90%, 95% thresholds
  • Tier Migration Logic: HOTβ†’WARMβ†’COLD transitions
export class MementoGroomingService {
// Emergency synchronous grooming for API endpoints
async forceImmediateGrooming(userId: string): Promise<void>

// Background asynchronous grooming
async checkAndScheduleGrooming(userId: string): Promise<void>

// Core grooming logic
private async groomHotToWarm(userId: string): Promise<void>
private async groomWarmToCold(userId: string): Promise<void>
}

MementoEvaluationService​

Location: b4m-core/packages/core/services/llm/MementoEvaluationService.ts

Purpose: AI-powered memento analysis and scoring

Features:

  • Content summarization
  • Importance scoring (0-1000)
  • Tag extraction
  • Semantic analysis
  • Quality assessment

ChatCompletionFeatures​

Location: b4m-core/packages/core/services/llm/ChatCompletionFeatures.ts

Critical Production Fixes:

  • HOT-Only Context Loading: Prevents loading ALL mementos
  • Token Limit Enforcement: 50K emergency cap
  • Memory Limit Checks: Before automatic memento creation
  • Graceful Degradation: Skips creation if memory full

Memory Management & Grooming​

Three-Tier Memory System​

TierPurposeSize LimitAccess PatternMigration Rules
HOTImmediate context80% of totalEvery chat completionLowest weight β†’ WARM
WARMRecent memoryNo strict limitSimilarity searchLowest weight β†’ COLD
COLDLong-term storageUnlimitedExplicit retrievalArchive or delete

Memory Limits & Thresholds​

export const MEMORY_LIMITS = {
DEFAULT_MAX_TOTAL_CHARS: 32000,
WARNING_THRESHOLD: 0.75, // 75% - Background grooming
DANGER_THRESHOLD: 0.9, // 90% - Aggressive grooming
EMERGENCY_THRESHOLD: 0.95, // 95% - Synchronous grooming
};

export const TARGET_THRESHOLDS = {
HOT_TARGET: 0.8, // Reduce HOT to 80% after grooming
HOT_TRIGGER: 0.9, // Trigger when HOT reaches 90%
WARM_TARGET: 0.8, // Reduce WARM to 80% after grooming
WARM_TRIGGER: 0.9, // Trigger when WARM reaches 90%
};

Grooming Algorithm​

Phase 1: WARM β†’ COLD Migration​

private async groomWarmToCold(userId: string): Promise<void> {
const warmMementos = await Memento.find({
userId,
tier: MementoTier.WARM,
}).sort({ weight: 1, lastAccessedAt: 1 }); // Lowest priority first

// Target 15% of WARM mementos for demotion
const targetCount = Math.ceil(warmMementos.length * 0.15);
const memosToDowngrade = warmMementos.slice(0, targetCount);

// Batch update for performance
await Memento.updateMany(
{ _id: { $in: memosToDowngrade.map(m => m._id) } },
{ $set: { tier: MementoTier.COLD } }
);
}

Phase 2: HOT β†’ WARM Migration​

private async groomHotToWarm(userId: string, maxTotalChars: number): Promise<void> {
const hotMementos = await Memento.find({
userId,
tier: MementoTier.HOT,
}).sort({ weight: 1, lastAccessedAt: 1 });

const currentHotSize = calculateHotMementoSize(hotMementos);
const targetSize = maxTotalChars * TARGET_THRESHOLDS.HOT_TARGET;

// Calculate exact amount to remove
const sizeToRecover = currentHotSize - targetSize;
let recoveredSize = 0;
const memosToDowngrade = [];

for (const memo of hotMementos) {
if (recoveredSize >= sizeToRecover) break;

recoveredSize += calculateMementoSize(memo);
memosToDowngrade.push(memo._id);
}

// Batch migration
await Memento.updateMany(
{ _id: { $in: memosToDowngrade } },
{ $set: { tier: MementoTier.WARM } }
);
}

Emergency Memory Enforcement​

Critical Feature: Prevents the 161% memory overflow issue

// In API endpoints (create.ts, create-batch.ts)
const projectedUsagePercent = projectedHotSize / maxTotalChars;

if (projectedUsagePercent > 0.95) {
// Force immediate synchronous grooming
await groomingService.checkAndScheduleGrooming(req.user.id, maxTotalChars, true);

// Re-check after grooming
const postGroomUsagePercent = calculateNewUsage();

if (postGroomUsagePercent > 1.0) {
return res.status(413).json({
error: `Memory limit exceeded: ${(postGroomUsagePercent * 100).toFixed(1)}%`,
currentUsage: Math.round(postGroomUsagePercent * 100),
maxLimit: 100
});
}
}

Context Window Protection​

The Production Crisis​

Problem: Users with large memento collections experienced catastrophic failures:

Error: input length and max_tokens exceed context limit: 188544 + 17000 > 200000

Multi-Layer Protection System​

Layer 1: HOT-Only Context Loading​

// Before (DANGEROUS - loads ALL mementos)
const allMementos = await this.db.mementos.findByUserId(this.user.id);

// After (SAFE - loads only HOT tier)
const hotMementos = await this.db.mementos.findHotMementosByUserId(this.user.id);

Impact: Reduced context size by 60-90% for typical users

Layer 2: Emergency Token Limiting​

async getContextMessages(): Promise<IMessage[]> {
const MAX_MEMENTO_TOKENS = 50000; // Emergency cap
let totalTokens = 0;
const safeMessages: IMessage[] = [];

for (const memento of hotMementos) {
const estimatedTokens = Math.ceil(content.length / 3.5);

if (totalTokens + estimatedTokens > MAX_MEMENTO_TOKENS) {
this.logger.warn(`Token limit reached, skipping remaining mementos`);
break;
}

safeMessages.push(message);
totalTokens += estimatedTokens;
}

return safeMessages;
}

Layer 3: Context Window Validation​

// In ChatCompletion.ts
const contextLimit = modelInfo.contextWindow ?? 200000;
const safetyBuffer = 1000;
const maxSafeInputTokens = contextLimit - maxTokens - safetyBuffer;

if (inputTokens > maxSafeInputTokens) {
logger.error(`🚨 CRITICAL: Context overflow detected!`, {
inputTokens,
maxTokens,
contextLimit,
userId: this.user.id
});

throw new Error(`Context window exceeded: ${inputTokens} + ${maxTokens} > ${contextLimit}`);
}

Layer 4: Model-Specific max_tokens Protection​

const modelMaxOutputTokens = modelInfo.max_tokens ?? 16384;
let safeMaxTokens = maxTokens;

if (maxTokens > modelMaxOutputTokens) {
safeMaxTokens = modelMaxOutputTokens;
logger.warn(`Adjusting max_tokens from ${maxTokens} to ${safeMaxTokens} for model ${model}`);
}

Database Schema & Performance​

MongoDB Schema​

interface IMementoDocument {
_id: ObjectId;
userId: string; // Index: { userId: 1, tier: 1 }
sessionId: string; // Index: { sessionId: 1 }
questId?: string; // Index: { questId: 1 }

// Content
type: MementoType; // 'prompt' | 'reply' | 'insight' | 'context'
tier: MementoTier; // 'hot' | 'warm' | 'cold'
summary: string; // AI-generated summary
fullContent: string; // Complete content

// Scoring & Management
weight: number; // 0-1 importance score
tags: string[]; // Index: { tags: 1 }
lastAccessedAt: Date; // Index: { lastAccessedAt: 1 }

// Metadata
isArchived: boolean;
embedding?: number[]; // Vector for similarity search
metadata?: Record<string, any>;

// Timestamps
createdAt: Date;
updatedAt: Date;
}

Critical Database Indexes​

// Primary user queries (most important)
db.mementos.createIndex({ userId: 1, tier: 1 });

// HOT tier optimization (for context loading)
db.mementos.createIndex({ userId: 1, tier: 1, weight: -1, lastAccessedAt: -1 });

// Session-based queries
db.mementos.createIndex({ sessionId: 1 });

// Quest tracking
db.mementos.createIndex({ questId: 1 });

// Tag searching
db.mementos.createIndex({ tags: 1 });

// Grooming operations
db.mementos.createIndex({ lastAccessedAt: 1 });
db.mementos.createIndex({ weight: 1 });

// Vector similarity (future)
db.mementos.createIndex({ embedding: "2dsphere" });

Optimized Database Queries​

HOT Mementos Query (Production Critical)​

// Optimized static method
MementoSchema.statics.findHotMementosByUserId = function(userId: string) {
return this.find({ userId, tier: MementoTier.HOT })
.sort({ weight: -1, lastAccessedAt: -1 }) // Highest priority first
.lean(); // Skip Mongoose overhead for read-only operations
};

Grooming Queries​

// WARM tier grooming (batch efficient)
const warmMementos = await Memento.find({
userId,
tier: MementoTier.WARM,
})
.sort({ weight: 1, lastAccessedAt: 1 })
.limit(targetCount)
.select('_id'); // Only fetch IDs for update

// Batch update (single query)
await Memento.updateMany(
{ _id: { $in: warmMementos.map(m => m._id) } },
{ $set: { tier: MementoTier.COLD } }
);

API Endpoints​

Core CRUD Operations​

POST /api/mementos/create​

Purpose: Create single memento with memory enforcement

Critical Features:

  • Pre-creation memory validation
  • Emergency grooming triggers
  • Synchronous memory enforcement
  • Comprehensive logging

Request:

interface CreateMementoRequest {
type: MementoType;
tier: MementoTier;
weight: number; // 0-1000 (converted to 0-1)
sessionId: string; // Required
summary: string;
fullContent?: string;
tags?: string[];
metadata?: Record<string, any>;
questId?: string;
lastAccessedAt?: Date;
isArchived?: boolean;
}

Response:

// Success (201)
IMementoDocument

// Memory limit exceeded (413)
{
error: string;
currentUsage: number; // Percentage
maxLimit: number; // 100
currentSize: number; // Characters
newMementoSize: number; // Characters
maxTotalChars: number; // Limit
}

POST /api/mementos/create-batch​

Purpose: Batch create mementos with collective memory validation

Key Differences from Single Create:

  • Validates total batch size before creation
  • All-or-nothing transaction semantics
  • Collective memory impact calculation
  • Batch-optimized grooming

GET /api/mementos/list​

Purpose: Paginated memento listing with filtering

Query Parameters:

interface ListMementosParams {
page?: number; // Default: 1
limit?: number; // Default: 50, Max: 100
tier?: MementoTier; // Filter by tier
type?: MementoType; // Filter by type
sessionId?: string; // Filter by session
search?: string; // Full-text search
tags?: string[]; // Tag filtering
sortBy?: 'weight' | 'createdAt' | 'lastAccessedAt';
sortOrder?: 'asc' | 'desc';
}

Response:

interface PaginatedMementos {
mementos: IMementoDocument[];
totalCount: number;
page: number;
limit: number;
totalPages: number;
hasNextPage: boolean;
hasPrevPage: boolean;

// Memory usage stats
memoryUsage: {
hotSize: number;
hotCount: number;
warmSize: number;
warmCount: number;
coldSize: number;
coldCount: number;
totalSize: number;
usagePercent: number;
maxTotalChars: number;
};
}

PUT /api/mementos/:id​

Purpose: Update memento with tier change validation

Special Handling:

  • Tier changes trigger weight recalculation
  • Memory impact validation for HOT promotions
  • Access time updates
  • Optimistic locking prevention

DELETE /api/mementos/:id​

Purpose: Safe deletion with cleanup

Features:

  • Soft delete option (isArchived: true)
  • Hard delete with cascade cleanup
  • Memory usage recalculation
  • Audit trail logging

Internal Endpoints​

POST /api/mementos/groom​

Purpose: Manual grooming trigger (admin only)

Use Cases:

  • Emergency manual intervention
  • Maintenance operations
  • Testing grooming logic

GET /api/mementos/stats​

Purpose: Detailed memory usage statistics

Response:

interface MementoStats {
userId: string;
memoryUsage: {
hotTier: { count: number; size: number; avgWeight: number; };
warmTier: { count: number; size: number; avgWeight: number; };
coldTier: { count: number; size: number; avgWeight: number; };
total: { count: number; size: number; usagePercent: number; };
};
groomingHistory: {
lastGroomAt?: Date;
groomCount: number;
avgGroomInterval: number; // minutes
};
performance: {
avgContextLoadTime: number; // ms
avgCreationTime: number; // ms
};
}

Production Issues & Fixes​

Issue 1: Context Window Overflow (CRITICAL)​

Problem: 188K+ token context causing model API failures

Error: input length and max_tokens exceed context limit: 188544 + 17000 > 200000

Root Cause: Loading ALL user mementos for context instead of HOT tier only

Fix: Multi-layer protection system

  • βœ… HOT-only context loading (60-90% reduction)
  • βœ… Emergency token limits (50K cap)
  • βœ… Model-specific validation
  • βœ… Context window safety checks

Impact: Zero context overflow errors since deployment

Issue 2: Memory Usage at 161% (CRITICAL)​

Problem: Memory grooming not preventing unlimited growth

Root Cause: Asynchronous grooming allowed memory to exceed limits during high-frequency creation

Fix: Synchronous memory enforcement

  • βœ… Pre-creation memory validation (95% threshold)
  • βœ… Immediate synchronous grooming
  • βœ… All-or-nothing memory semantics
  • βœ… Comprehensive usage logging

Impact: Memory usage maintained below 95% threshold

Issue 3: Database Performance Degradation​

Problem: Full table scans for memento queries

Root Cause: Missing compound indexes for common query patterns

Fix: Optimized database schema

  • βœ… Added compound indexes
  • βœ… Optimized HOT tier queries
  • βœ… Lean queries for read operations
  • βœ… Batch operations for grooming

Impact: 10x improvement in query performance

Issue 4: Cascading Failures​

Problem: Memory overflow in one user affecting system stability

Root Cause: Lack of user isolation and circuit breakers

Fix: Isolation and resilience

  • βœ… Per-user memory limits
  • βœ… Graceful degradation
  • βœ… Error isolation
  • βœ… Automatic recovery

Impact: Single user issues no longer affect system


Monitoring & Alerting​

Key Metrics​

Memory Usage Metrics​

// CloudWatch Custom Metrics
await cloudWatch.putMetricData({
Namespace: 'Bike4Mind/Mementos',
MetricData: [
{
MetricName: 'MemoryUsagePercent',
Dimensions: [{ Name: 'UserId', Value: userId }],
Value: usagePercent * 100,
Unit: 'Percent'
},
{
MetricName: 'HotTierSize',
Dimensions: [{ Name: 'UserId', Value: userId }],
Value: hotSize,
Unit: 'Count'
}
]
});

Performance Metrics​

  • Context load time (target: <100ms)
  • Grooming execution time (target: <5s)
  • Memory enforcement overhead (target: <50ms)
  • API response times (target: <200ms)

Error Metrics​

  • Context window overflow count (target: 0)
  • Memory limit violations (target: 0)
  • Grooming failures (target: <1%)
  • API error rates (target: <0.1%)

Alerting Rules​

Critical Alerts (Page Immediately)​

- name: context_window_overflow
condition: count > 0
message: "🚨 CRITICAL: Context window overflow detected"
action: immediate_page

- name: memory_enforcement_failure
condition: memory_usage > 100%
message: "🚨 CRITICAL: Memory limit enforcement failed"
action: immediate_page

- name: grooming_system_down
condition: grooming_failures > 5
message: "🚨 CRITICAL: Grooming system failure"
action: immediate_page

Warning Alerts (Slack Notification)​

- name: high_memory_usage
condition: memory_usage > 85%
message: "⚠️ WARNING: High memory usage detected"
action: slack_alert

- name: slow_context_loading
condition: context_load_time > 500ms
message: "⚠️ WARNING: Slow context loading"
action: slack_alert

Monitoring Dashboard​

Essential Widgets:

  1. Memory Usage Trends: Per-user memory consumption over time
  2. Grooming Efficiency: Success rates and execution times
  3. Context Window Safety: Token usage distribution
  4. API Performance: Response times and error rates
  5. Database Performance: Query execution times
  6. Error Rates: Failure categorization and trends

Troubleshooting Guide​

Common Issues​

1. Memory Limit Exceeded (413 Error)​

Symptoms:

HTTP 413: Memory limit exceeded. Current usage: 98.2% after grooming.

Diagnosis:

# Check current memory usage
curl -H "Authorization: Bearer $TOKEN" \
"https://api.bike4mind.com/api/mementos/stats"

# Check grooming history
grep "grooming" /var/log/application.log | tail -20

Solutions:

  1. Manual grooming: POST /api/mementos/groom
  2. Increase memory limit: Update MEMORY_LIMITS.DEFAULT_MAX_TOTAL_CHARS
  3. Bulk archive: Archive old COLD tier mementos
  4. Check grooming logic: Verify thresholds are appropriate

2. Context Window Overflow​

Symptoms:

Error: input length and max_tokens exceed context limit: 188544 + 17000 > 200000

Diagnosis:

// Check if HOT-only loading is working
const hotMementos = await db.mementos.findHotMementosByUserId(userId);
const totalTokens = calculateTotalTokens(hotMementos);
console.log(`HOT tier tokens: ${totalTokens}`);

Solutions:

  1. Verify HOT-only loading: Ensure not loading ALL mementos
  2. Check token limits: Verify 50K emergency cap is active
  3. Reduce HOT tier: Lower HOT_TARGET threshold
  4. Model compatibility: Check model context window size

3. Slow Performance​

Symptoms:

  • Context loading >500ms
  • API responses >1s
  • Database timeouts

Diagnosis:

// Check index usage
db.mementos.explain().find({ userId: "user123", tier: "hot" });

// Check query performance
console.time('context-load');
const context = await getContextMessages();
console.timeEnd('context-load');

Solutions:

  1. Index optimization: Ensure compound indexes exist
  2. Query tuning: Use .lean() for read-only operations
  3. Caching: Implement Redis caching for frequent queries
  4. Database scaling: Consider read replicas

4. Grooming Failures​

Symptoms:

Error during grooming operation: MongoError: Connection timeout

Diagnosis:

# Check grooming logs
grep "grooming" /var/log/application.log | grep "ERROR"

# Check database connection
mongosh --eval "db.adminCommand('ping')"

Solutions:

  1. Database connectivity: Check MongoDB cluster health
  2. Timeout configuration: Increase query timeouts
  3. Batch size optimization: Reduce grooming batch sizes
  4. Retry logic: Implement exponential backoff

Debugging Tools​

Memory Usage Calculator​

export function debugMemoryUsage(userId: string) {
const mementos = await Memento.findByUserId(userId);

const stats = {
hot: { count: 0, size: 0 },
warm: { count: 0, size: 0 },
cold: { count: 0, size: 0 }
};

for (const memento of mementos) {
const size = calculateMementoSize(memento);
stats[memento.tier].count++;
stats[memento.tier].size += size;
}

console.table(stats);
return stats;
}

Context Window Simulator​

export function simulateContextWindow(userId: string, model: string) {
const hotMementos = await findHotMementosByUserId(userId);
const messages = assembleContextMessages(hotMementos);
const totalTokens = calculateTotalTokens(messages);
const modelInfo = getModelInfo(model);

console.log({
hotMementoCount: hotMementos.length,
totalTokens,
modelContextWindow: modelInfo.contextWindow,
utilization: (totalTokens / modelInfo.contextWindow * 100).toFixed(1) + '%',
safe: totalTokens < modelInfo.contextWindow * 0.8
});
}

Best Practices​

For Developers​

1. Memory-Aware Development​

// βœ… GOOD: Always check memory before creating mementos
const currentUsage = await calculateMemoryUsage(userId);
if (currentUsage > 0.9) {
await forceImmediateGrooming(userId);
}

// ❌ BAD: Create without memory checks
await Memento.create(data); // Could cause overflow

2. Context-Safe Queries​

// βœ… GOOD: HOT tier only for context
const contextMementos = await findHotMementosByUserId(userId);

// ❌ BAD: All mementos for context
const allMementos = await findByUserId(userId); // Context overflow risk

3. Performance-Conscious Operations​

// βœ… GOOD: Batch operations
await Memento.updateMany(
{ _id: { $in: ids } },
{ $set: { tier: MementoTier.COLD } }
);

// ❌ BAD: Individual updates
for (const id of ids) {
await Memento.findByIdAndUpdate(id, { tier: MementoTier.COLD });
}

For Operations​

1. Monitoring Setup​

  • Set up CloudWatch dashboards for all key metrics
  • Configure alerts for critical thresholds
  • Monitor database performance regularly
  • Track user adoption and usage patterns

2. Capacity Planning​

  • Monitor memory growth trends
  • Plan for user base expansion
  • Consider database sharding at scale
  • Implement auto-scaling for Lambda functions

3. Backup & Recovery​

  • Regular database backups
  • Test grooming logic in staging
  • Maintain rollback procedures
  • Document emergency procedures

For Product​

1. User Education​

  • Explain tier system to power users
  • Provide memory usage visibility
  • Guide optimal memento creation
  • Educate on search and filtering

2. Feature Development​

  • Consider memory impact of new features
  • Design with grooming in mind
  • Implement usage analytics
  • Plan for user feedback integration

3. Scaling Considerations​

  • Design for multi-tenant efficiency
  • Consider per-user limit customization
  • Plan for enterprise-scale deployments
  • Implement usage-based pricing models

Future Enhancements​

Short Term (Next Quarter)​

  • Implement semantic memento retrieval
  • Use embeddings for context relevance
  • Improve search beyond text matching
  • Enable AI-powered memento clustering

2. Advanced Grooming​

  • Machine learning-based importance scoring
  • User behavior pattern recognition
  • Predictive grooming triggers
  • Personalized memory management

3. Performance Optimizations​

  • Redis caching layer
  • Database connection pooling
  • Async processing pipeline
  • CDN for static memento content

Medium Term (6 Months)​

1. Multi-Modal Mementos​

  • Image and file attachments
  • Audio transcription support
  • Video content summarization
  • Rich media memory search

2. Collaborative Memory​

  • Shared memento spaces
  • Team memory pools
  • Permission-based access
  • Collaborative grooming

3. Advanced Analytics​

  • Memory usage prediction
  • Content quality scoring
  • User behavior insights
  • Performance optimization recommendations

Long Term (1 Year+)​

1. AI-Powered Memory Management​

  • Fully autonomous grooming
  • Predictive context assembly
  • Intelligent tier management
  • Self-optimizing memory systems

2. Federation & Synchronization​

  • Cross-device memory sync
  • Distributed memory architecture
  • Edge computing integration
  • Real-time collaboration

3. Enterprise Features​

  • Advanced compliance tools
  • Audit and governance
  • Bulk management operations
  • Integration with enterprise systems

Conclusion​

The Mementos System represents a sophisticated approach to AI memory management that balances immediate performance needs with long-term scalability. Through the implementation of critical production fixes, comprehensive monitoring, and intelligent automation, the system provides reliable, high-performance memory services that scale with user needs.

The multi-layered protection against context window overflow, combined with proactive memory management and emergency enforcement mechanisms, ensures system stability even under extreme load conditions. The three-tier memory hierarchy provides an optimal balance between accessibility and efficiency, while the grooming system maintains performance without user intervention.

This architecture serves as a foundation for advanced AI memory capabilities while maintaining the reliability and performance standards required for production deployment. The comprehensive monitoring and troubleshooting capabilities ensure that issues can be quickly identified and resolved, while the modular design allows for future enhancements without compromising existing functionality.


This documentation reflects the system state as of the critical production fixes implementation. For the latest updates and changes, refer to the system changelogs and monitoring dashboards.