π§ Mementos System Architecture
Table of Contentsβ
- Overview
- Architecture Overview
- Data Flow & Request Lifecycle
- Components Deep Dive
- Memory Management & Grooming
- Context Window Protection
- Database Schema & Performance
- API Endpoints
- Production Issues & Fixes
- Monitoring & Alerting
- Troubleshooting Guide
- Best Practices
Overviewβ
The Mementos System provides intelligent, persistent memory for AI conversations that automatically manages itself to prevent context window overflow and optimize retrieval performance. It implements a three-tier memory hierarchy (HOT/WARM/COLD) with automatic grooming, emergency memory enforcement, and context window protection.
Key Featuresβ
- π₯ Three-Tier Memory System: HOT (immediate access), WARM (recent context), COLD (long-term storage)
- π‘οΈ Automatic Memory Management: Intelligent grooming prevents memory bloat
- π¨ Context Window Protection: Prevents 188K+ token overflow crashes
- β‘ Performance Optimized: Database queries optimized for real-time access
- π Production Monitoring: Comprehensive logging and alerting
- π Self-Healing: Automatic recovery from memory limit violations
System Statsβ
- Memory Limit: 32,000 characters (configurable)
- Context Token Limit: 50,000 tokens emergency cap
- Grooming Triggers: 75% warning, 90% danger, 95% emergency
- Database: MongoDB with optimized indexes
- Performance: Sub-100ms context retrieval for 1000+ mementos
Architecture Overviewβ
graph TB
subgraph "Client Layer"
UI[ProfileModal/MementosTabContent.tsx]
API_CALLS[mementosAPICalls.ts]
HOOKS[useApi Hook]
end
subgraph "API Layer"
CREATE[/api/mementos/create]
BATCH[/api/mementos/create-batch]
LIST[/api/mementos/list]
UPDATE[/api/mementos/update]
DELETE[/api/mementos/delete]
end
subgraph "Core Services"
GROOMING[MementoGroomingService]
EVALUATION[MementoEvaluationService]
CHAT_FEATURES[ChatCompletionFeatures]
end
subgraph "Memory Management"
HOT[(HOT Tier<br/>Immediate Access)]
WARM[(WARM Tier<br/>Recent Context)]
COLD[(COLD Tier<br/>Long-term Storage)]
GROOMER[Grooming Engine]
end
subgraph "Database"
MONGO[(MongoDB<br/>Mementos Collection)]
INDEXES[Optimized Indexes]
end
subgraph "Context Integration"
CHAT[Chat Completion]
CONTEXT[Context Window<br/>Protection]
TOKENS[Token Limiting]
end
UI --> API_CALLS
API_CALLS --> HOOKS
HOOKS --> CREATE
HOOKS --> BATCH
HOOKS --> LIST
HOOKS --> UPDATE
HOOKS --> DELETE
CREATE --> GROOMING
BATCH --> GROOMING
CREATE --> EVALUATION
GROOMING --> HOT
GROOMING --> WARM
GROOMING --> COLD
GROOMING --> GROOMER
CHAT_FEATURES --> CONTEXT
CONTEXT --> TOKENS
CONTEXT --> HOT
HOT --> MONGO
WARM --> MONGO
COLD --> MONGO
MONGO --> INDEXES
Data Flow & Request Lifecycleβ
1. Memento Creation Flowβ
// Example: User creates a memento
User Input β MementosTabContent.tsx
β createMemento() in mementosAPICalls.ts
β useApi hook (adds auth)
β POST /api/mementos/create
β Memory limit check (95% threshold)
β Emergency grooming (if needed)
β MementoEvaluationService.evaluate()
β Database insertion
β Background grooming trigger
β Response to client
β UI update with toast
2. Chat Completion Integration Flowβ
// Example: AI needs context for response
Chat Request β ChatCompletionFeatures.ts
β MementoFeature.getContextMessages()
β findHotMementosByUserId() (HOT tier only)
β Token limit enforcement (50K cap)
β Context message assembly
β LLM API call with safe context
β Response generation
β Automatic memento creation
β Memory limit check before creation
β Grooming if needed
3. Memory Grooming Flowβ
// Example: Memory limit exceeded
Memory Check β 90% threshold exceeded
β MementoGroomingService.checkAndScheduleGrooming()
β Synchronous grooming (if forceImmediate=true)
β groomWarmToCold() (15% of WARM β COLD)
β groomHotToWarm() (reduce HOT to 80% target)
β Database batch updates
β Memory recalculation
β Logging and monitoring
Components Deep Diveβ
Frontend Componentsβ
MementosTabContent.tsxβ
Location: packages/client/app/components/ProfileModal/MementosTabContent.tsx
Responsibilities:
- Interactive memento management interface
- Real-time memory usage visualization
- Inline editing with optimistic UI updates
- CSV export/import functionality
- Search and filtering capabilities
Key Features:
- Joy UI components (not Material UI)
- React Query for server state
- Optimistic updates with rollback
- Memory usage progress bars
- Tier-based color coding
Performance Optimizations:
- Virtualized lists for 1000+ mementos
- Debounced search input
- Memoized sorting and filtering
- Lazy loading of large content
mementosAPICalls.tsβ
Location: packages/client/app/utils/mementosAPICalls.ts
Purpose: Type-safe API abstraction layer
export interface MementoAPIResponse {
createMemento: (data: CreateMementoRequest) => Promise<IMementoDocument>;
createMementosBatch: (data: CreateMementoRequest[]) => Promise<IMementoDocument[]>;
updateMemento: (id: string, data: UpdateMementoRequest) => Promise<IMementoDocument>;
deleteMemento: (id: string) => Promise<void>;
listMementos: (params: ListMementosParams) => Promise<PaginatedMementos>;
}
Backend Servicesβ
MementoGroomingServiceβ
Location: packages/client/services/MementoGroomingService.ts
Critical Features:
- Synchronous Emergency Grooming: Prevents memory overflow in real-time
- Asynchronous Background Grooming: Regular maintenance
- Memory Threshold Management: 75%, 90%, 95% thresholds
- Tier Migration Logic: HOTβWARMβCOLD transitions
export class MementoGroomingService {
// Emergency synchronous grooming for API endpoints
async forceImmediateGrooming(userId: string): Promise<void>
// Background asynchronous grooming
async checkAndScheduleGrooming(userId: string): Promise<void>
// Core grooming logic
private async groomHotToWarm(userId: string): Promise<void>
private async groomWarmToCold(userId: string): Promise<void>
}
MementoEvaluationServiceβ
Location: b4m-core/packages/core/services/llm/MementoEvaluationService.ts
Purpose: AI-powered memento analysis and scoring
Features:
- Content summarization
- Importance scoring (0-1000)
- Tag extraction
- Semantic analysis
- Quality assessment
ChatCompletionFeaturesβ
Location: b4m-core/packages/core/services/llm/ChatCompletionFeatures.ts
Critical Production Fixes:
- HOT-Only Context Loading: Prevents loading ALL mementos
- Token Limit Enforcement: 50K emergency cap
- Memory Limit Checks: Before automatic memento creation
- Graceful Degradation: Skips creation if memory full
Memory Management & Groomingβ
Three-Tier Memory Systemβ
Tier | Purpose | Size Limit | Access Pattern | Migration Rules |
---|---|---|---|---|
HOT | Immediate context | 80% of total | Every chat completion | Lowest weight β WARM |
WARM | Recent memory | No strict limit | Similarity search | Lowest weight β COLD |
COLD | Long-term storage | Unlimited | Explicit retrieval | Archive or delete |
Memory Limits & Thresholdsβ
export const MEMORY_LIMITS = {
DEFAULT_MAX_TOTAL_CHARS: 32000,
WARNING_THRESHOLD: 0.75, // 75% - Background grooming
DANGER_THRESHOLD: 0.9, // 90% - Aggressive grooming
EMERGENCY_THRESHOLD: 0.95, // 95% - Synchronous grooming
};
export const TARGET_THRESHOLDS = {
HOT_TARGET: 0.8, // Reduce HOT to 80% after grooming
HOT_TRIGGER: 0.9, // Trigger when HOT reaches 90%
WARM_TARGET: 0.8, // Reduce WARM to 80% after grooming
WARM_TRIGGER: 0.9, // Trigger when WARM reaches 90%
};
Grooming Algorithmβ
Phase 1: WARM β COLD Migrationβ
private async groomWarmToCold(userId: string): Promise<void> {
const warmMementos = await Memento.find({
userId,
tier: MementoTier.WARM,
}).sort({ weight: 1, lastAccessedAt: 1 }); // Lowest priority first
// Target 15% of WARM mementos for demotion
const targetCount = Math.ceil(warmMementos.length * 0.15);
const memosToDowngrade = warmMementos.slice(0, targetCount);
// Batch update for performance
await Memento.updateMany(
{ _id: { $in: memosToDowngrade.map(m => m._id) } },
{ $set: { tier: MementoTier.COLD } }
);
}
Phase 2: HOT β WARM Migrationβ
private async groomHotToWarm(userId: string, maxTotalChars: number): Promise<void> {
const hotMementos = await Memento.find({
userId,
tier: MementoTier.HOT,
}).sort({ weight: 1, lastAccessedAt: 1 });
const currentHotSize = calculateHotMementoSize(hotMementos);
const targetSize = maxTotalChars * TARGET_THRESHOLDS.HOT_TARGET;
// Calculate exact amount to remove
const sizeToRecover = currentHotSize - targetSize;
let recoveredSize = 0;
const memosToDowngrade = [];
for (const memo of hotMementos) {
if (recoveredSize >= sizeToRecover) break;
recoveredSize += calculateMementoSize(memo);
memosToDowngrade.push(memo._id);
}
// Batch migration
await Memento.updateMany(
{ _id: { $in: memosToDowngrade } },
{ $set: { tier: MementoTier.WARM } }
);
}
Emergency Memory Enforcementβ
Critical Feature: Prevents the 161% memory overflow issue
// In API endpoints (create.ts, create-batch.ts)
const projectedUsagePercent = projectedHotSize / maxTotalChars;
if (projectedUsagePercent > 0.95) {
// Force immediate synchronous grooming
await groomingService.checkAndScheduleGrooming(req.user.id, maxTotalChars, true);
// Re-check after grooming
const postGroomUsagePercent = calculateNewUsage();
if (postGroomUsagePercent > 1.0) {
return res.status(413).json({
error: `Memory limit exceeded: ${(postGroomUsagePercent * 100).toFixed(1)}%`,
currentUsage: Math.round(postGroomUsagePercent * 100),
maxLimit: 100
});
}
}
Context Window Protectionβ
The Production Crisisβ
Problem: Users with large memento collections experienced catastrophic failures:
Error: input length and max_tokens exceed context limit: 188544 + 17000 > 200000
Multi-Layer Protection Systemβ
Layer 1: HOT-Only Context Loadingβ
// Before (DANGEROUS - loads ALL mementos)
const allMementos = await this.db.mementos.findByUserId(this.user.id);
// After (SAFE - loads only HOT tier)
const hotMementos = await this.db.mementos.findHotMementosByUserId(this.user.id);
Impact: Reduced context size by 60-90% for typical users
Layer 2: Emergency Token Limitingβ
async getContextMessages(): Promise<IMessage[]> {
const MAX_MEMENTO_TOKENS = 50000; // Emergency cap
let totalTokens = 0;
const safeMessages: IMessage[] = [];
for (const memento of hotMementos) {
const estimatedTokens = Math.ceil(content.length / 3.5);
if (totalTokens + estimatedTokens > MAX_MEMENTO_TOKENS) {
this.logger.warn(`Token limit reached, skipping remaining mementos`);
break;
}
safeMessages.push(message);
totalTokens += estimatedTokens;
}
return safeMessages;
}
Layer 3: Context Window Validationβ
// In ChatCompletion.ts
const contextLimit = modelInfo.contextWindow ?? 200000;
const safetyBuffer = 1000;
const maxSafeInputTokens = contextLimit - maxTokens - safetyBuffer;
if (inputTokens > maxSafeInputTokens) {
logger.error(`π¨ CRITICAL: Context overflow detected!`, {
inputTokens,
maxTokens,
contextLimit,
userId: this.user.id
});
throw new Error(`Context window exceeded: ${inputTokens} + ${maxTokens} > ${contextLimit}`);
}
Layer 4: Model-Specific max_tokens Protectionβ
const modelMaxOutputTokens = modelInfo.max_tokens ?? 16384;
let safeMaxTokens = maxTokens;
if (maxTokens > modelMaxOutputTokens) {
safeMaxTokens = modelMaxOutputTokens;
logger.warn(`Adjusting max_tokens from ${maxTokens} to ${safeMaxTokens} for model ${model}`);
}
Database Schema & Performanceβ
MongoDB Schemaβ
interface IMementoDocument {
_id: ObjectId;
userId: string; // Index: { userId: 1, tier: 1 }
sessionId: string; // Index: { sessionId: 1 }
questId?: string; // Index: { questId: 1 }
// Content
type: MementoType; // 'prompt' | 'reply' | 'insight' | 'context'
tier: MementoTier; // 'hot' | 'warm' | 'cold'
summary: string; // AI-generated summary
fullContent: string; // Complete content
// Scoring & Management
weight: number; // 0-1 importance score
tags: string[]; // Index: { tags: 1 }
lastAccessedAt: Date; // Index: { lastAccessedAt: 1 }
// Metadata
isArchived: boolean;
embedding?: number[]; // Vector for similarity search
metadata?: Record<string, any>;
// Timestamps
createdAt: Date;
updatedAt: Date;
}
Critical Database Indexesβ
// Primary user queries (most important)
db.mementos.createIndex({ userId: 1, tier: 1 });
// HOT tier optimization (for context loading)
db.mementos.createIndex({ userId: 1, tier: 1, weight: -1, lastAccessedAt: -1 });
// Session-based queries
db.mementos.createIndex({ sessionId: 1 });
// Quest tracking
db.mementos.createIndex({ questId: 1 });
// Tag searching
db.mementos.createIndex({ tags: 1 });
// Grooming operations
db.mementos.createIndex({ lastAccessedAt: 1 });
db.mementos.createIndex({ weight: 1 });
// Vector similarity (future)
db.mementos.createIndex({ embedding: "2dsphere" });
Optimized Database Queriesβ
HOT Mementos Query (Production Critical)β
// Optimized static method
MementoSchema.statics.findHotMementosByUserId = function(userId: string) {
return this.find({ userId, tier: MementoTier.HOT })
.sort({ weight: -1, lastAccessedAt: -1 }) // Highest priority first
.lean(); // Skip Mongoose overhead for read-only operations
};
Grooming Queriesβ
// WARM tier grooming (batch efficient)
const warmMementos = await Memento.find({
userId,
tier: MementoTier.WARM,
})
.sort({ weight: 1, lastAccessedAt: 1 })
.limit(targetCount)
.select('_id'); // Only fetch IDs for update
// Batch update (single query)
await Memento.updateMany(
{ _id: { $in: warmMementos.map(m => m._id) } },
{ $set: { tier: MementoTier.COLD } }
);
API Endpointsβ
Core CRUD Operationsβ
POST /api/mementos/createβ
Purpose: Create single memento with memory enforcement
Critical Features:
- Pre-creation memory validation
- Emergency grooming triggers
- Synchronous memory enforcement
- Comprehensive logging
Request:
interface CreateMementoRequest {
type: MementoType;
tier: MementoTier;
weight: number; // 0-1000 (converted to 0-1)
sessionId: string; // Required
summary: string;
fullContent?: string;
tags?: string[];
metadata?: Record<string, any>;
questId?: string;
lastAccessedAt?: Date;
isArchived?: boolean;
}
Response:
// Success (201)
IMementoDocument
// Memory limit exceeded (413)
{
error: string;
currentUsage: number; // Percentage
maxLimit: number; // 100
currentSize: number; // Characters
newMementoSize: number; // Characters
maxTotalChars: number; // Limit
}
POST /api/mementos/create-batchβ
Purpose: Batch create mementos with collective memory validation
Key Differences from Single Create:
- Validates total batch size before creation
- All-or-nothing transaction semantics
- Collective memory impact calculation
- Batch-optimized grooming
GET /api/mementos/listβ
Purpose: Paginated memento listing with filtering
Query Parameters:
interface ListMementosParams {
page?: number; // Default: 1
limit?: number; // Default: 50, Max: 100
tier?: MementoTier; // Filter by tier
type?: MementoType; // Filter by type
sessionId?: string; // Filter by session
search?: string; // Full-text search
tags?: string[]; // Tag filtering
sortBy?: 'weight' | 'createdAt' | 'lastAccessedAt';
sortOrder?: 'asc' | 'desc';
}
Response:
interface PaginatedMementos {
mementos: IMementoDocument[];
totalCount: number;
page: number;
limit: number;
totalPages: number;
hasNextPage: boolean;
hasPrevPage: boolean;
// Memory usage stats
memoryUsage: {
hotSize: number;
hotCount: number;
warmSize: number;
warmCount: number;
coldSize: number;
coldCount: number;
totalSize: number;
usagePercent: number;
maxTotalChars: number;
};
}
PUT /api/mementos/:idβ
Purpose: Update memento with tier change validation
Special Handling:
- Tier changes trigger weight recalculation
- Memory impact validation for HOT promotions
- Access time updates
- Optimistic locking prevention
DELETE /api/mementos/:idβ
Purpose: Safe deletion with cleanup
Features:
- Soft delete option (isArchived: true)
- Hard delete with cascade cleanup
- Memory usage recalculation
- Audit trail logging
Internal Endpointsβ
POST /api/mementos/groomβ
Purpose: Manual grooming trigger (admin only)
Use Cases:
- Emergency manual intervention
- Maintenance operations
- Testing grooming logic
GET /api/mementos/statsβ
Purpose: Detailed memory usage statistics
Response:
interface MementoStats {
userId: string;
memoryUsage: {
hotTier: { count: number; size: number; avgWeight: number; };
warmTier: { count: number; size: number; avgWeight: number; };
coldTier: { count: number; size: number; avgWeight: number; };
total: { count: number; size: number; usagePercent: number; };
};
groomingHistory: {
lastGroomAt?: Date;
groomCount: number;
avgGroomInterval: number; // minutes
};
performance: {
avgContextLoadTime: number; // ms
avgCreationTime: number; // ms
};
}
Production Issues & Fixesβ
Issue 1: Context Window Overflow (CRITICAL)β
Problem: 188K+ token context causing model API failures
Error: input length and max_tokens exceed context limit: 188544 + 17000 > 200000
Root Cause: Loading ALL user mementos for context instead of HOT tier only
Fix: Multi-layer protection system
- β HOT-only context loading (60-90% reduction)
- β Emergency token limits (50K cap)
- β Model-specific validation
- β Context window safety checks
Impact: Zero context overflow errors since deployment
Issue 2: Memory Usage at 161% (CRITICAL)β
Problem: Memory grooming not preventing unlimited growth
Root Cause: Asynchronous grooming allowed memory to exceed limits during high-frequency creation
Fix: Synchronous memory enforcement
- β Pre-creation memory validation (95% threshold)
- β Immediate synchronous grooming
- β All-or-nothing memory semantics
- β Comprehensive usage logging
Impact: Memory usage maintained below 95% threshold
Issue 3: Database Performance Degradationβ
Problem: Full table scans for memento queries
Root Cause: Missing compound indexes for common query patterns
Fix: Optimized database schema
- β Added compound indexes
- β Optimized HOT tier queries
- β Lean queries for read operations
- β Batch operations for grooming
Impact: 10x improvement in query performance
Issue 4: Cascading Failuresβ
Problem: Memory overflow in one user affecting system stability
Root Cause: Lack of user isolation and circuit breakers
Fix: Isolation and resilience
- β Per-user memory limits
- β Graceful degradation
- β Error isolation
- β Automatic recovery
Impact: Single user issues no longer affect system
Monitoring & Alertingβ
Key Metricsβ
Memory Usage Metricsβ
// CloudWatch Custom Metrics
await cloudWatch.putMetricData({
Namespace: 'Bike4Mind/Mementos',
MetricData: [
{
MetricName: 'MemoryUsagePercent',
Dimensions: [{ Name: 'UserId', Value: userId }],
Value: usagePercent * 100,
Unit: 'Percent'
},
{
MetricName: 'HotTierSize',
Dimensions: [{ Name: 'UserId', Value: userId }],
Value: hotSize,
Unit: 'Count'
}
]
});
Performance Metricsβ
- Context load time (target: <100ms)
- Grooming execution time (target: <5s)
- Memory enforcement overhead (target: <50ms)
- API response times (target: <200ms)
Error Metricsβ
- Context window overflow count (target: 0)
- Memory limit violations (target: 0)
- Grooming failures (target: <1%)
- API error rates (target: <0.1%)
Alerting Rulesβ
Critical Alerts (Page Immediately)β
- name: context_window_overflow
condition: count > 0
message: "π¨ CRITICAL: Context window overflow detected"
action: immediate_page
- name: memory_enforcement_failure
condition: memory_usage > 100%
message: "π¨ CRITICAL: Memory limit enforcement failed"
action: immediate_page
- name: grooming_system_down
condition: grooming_failures > 5
message: "π¨ CRITICAL: Grooming system failure"
action: immediate_page
Warning Alerts (Slack Notification)β
- name: high_memory_usage
condition: memory_usage > 85%
message: "β οΈ WARNING: High memory usage detected"
action: slack_alert
- name: slow_context_loading
condition: context_load_time > 500ms
message: "β οΈ WARNING: Slow context loading"
action: slack_alert
Monitoring Dashboardβ
Essential Widgets:
- Memory Usage Trends: Per-user memory consumption over time
- Grooming Efficiency: Success rates and execution times
- Context Window Safety: Token usage distribution
- API Performance: Response times and error rates
- Database Performance: Query execution times
- Error Rates: Failure categorization and trends
Troubleshooting Guideβ
Common Issuesβ
1. Memory Limit Exceeded (413 Error)β
Symptoms:
HTTP 413: Memory limit exceeded. Current usage: 98.2% after grooming.
Diagnosis:
# Check current memory usage
curl -H "Authorization: Bearer $TOKEN" \
"https://api.bike4mind.com/api/mementos/stats"
# Check grooming history
grep "grooming" /var/log/application.log | tail -20
Solutions:
- Manual grooming:
POST /api/mementos/groom
- Increase memory limit: Update
MEMORY_LIMITS.DEFAULT_MAX_TOTAL_CHARS
- Bulk archive: Archive old COLD tier mementos
- Check grooming logic: Verify thresholds are appropriate
2. Context Window Overflowβ
Symptoms:
Error: input length and max_tokens exceed context limit: 188544 + 17000 > 200000
Diagnosis:
// Check if HOT-only loading is working
const hotMementos = await db.mementos.findHotMementosByUserId(userId);
const totalTokens = calculateTotalTokens(hotMementos);
console.log(`HOT tier tokens: ${totalTokens}`);
Solutions:
- Verify HOT-only loading: Ensure not loading ALL mementos
- Check token limits: Verify 50K emergency cap is active
- Reduce HOT tier: Lower
HOT_TARGET
threshold - Model compatibility: Check model context window size
3. Slow Performanceβ
Symptoms:
- Context loading >500ms
- API responses >1s
- Database timeouts
Diagnosis:
// Check index usage
db.mementos.explain().find({ userId: "user123", tier: "hot" });
// Check query performance
console.time('context-load');
const context = await getContextMessages();
console.timeEnd('context-load');
Solutions:
- Index optimization: Ensure compound indexes exist
- Query tuning: Use
.lean()
for read-only operations - Caching: Implement Redis caching for frequent queries
- Database scaling: Consider read replicas
4. Grooming Failuresβ
Symptoms:
Error during grooming operation: MongoError: Connection timeout
Diagnosis:
# Check grooming logs
grep "grooming" /var/log/application.log | grep "ERROR"
# Check database connection
mongosh --eval "db.adminCommand('ping')"
Solutions:
- Database connectivity: Check MongoDB cluster health
- Timeout configuration: Increase query timeouts
- Batch size optimization: Reduce grooming batch sizes
- Retry logic: Implement exponential backoff
Debugging Toolsβ
Memory Usage Calculatorβ
export function debugMemoryUsage(userId: string) {
const mementos = await Memento.findByUserId(userId);
const stats = {
hot: { count: 0, size: 0 },
warm: { count: 0, size: 0 },
cold: { count: 0, size: 0 }
};
for (const memento of mementos) {
const size = calculateMementoSize(memento);
stats[memento.tier].count++;
stats[memento.tier].size += size;
}
console.table(stats);
return stats;
}
Context Window Simulatorβ
export function simulateContextWindow(userId: string, model: string) {
const hotMementos = await findHotMementosByUserId(userId);
const messages = assembleContextMessages(hotMementos);
const totalTokens = calculateTotalTokens(messages);
const modelInfo = getModelInfo(model);
console.log({
hotMementoCount: hotMementos.length,
totalTokens,
modelContextWindow: modelInfo.contextWindow,
utilization: (totalTokens / modelInfo.contextWindow * 100).toFixed(1) + '%',
safe: totalTokens < modelInfo.contextWindow * 0.8
});
}
Best Practicesβ
For Developersβ
1. Memory-Aware Developmentβ
// β
GOOD: Always check memory before creating mementos
const currentUsage = await calculateMemoryUsage(userId);
if (currentUsage > 0.9) {
await forceImmediateGrooming(userId);
}
// β BAD: Create without memory checks
await Memento.create(data); // Could cause overflow
2. Context-Safe Queriesβ
// β
GOOD: HOT tier only for context
const contextMementos = await findHotMementosByUserId(userId);
// β BAD: All mementos for context
const allMementos = await findByUserId(userId); // Context overflow risk
3. Performance-Conscious Operationsβ
// β
GOOD: Batch operations
await Memento.updateMany(
{ _id: { $in: ids } },
{ $set: { tier: MementoTier.COLD } }
);
// β BAD: Individual updates
for (const id of ids) {
await Memento.findByIdAndUpdate(id, { tier: MementoTier.COLD });
}
For Operationsβ
1. Monitoring Setupβ
- Set up CloudWatch dashboards for all key metrics
- Configure alerts for critical thresholds
- Monitor database performance regularly
- Track user adoption and usage patterns
2. Capacity Planningβ
- Monitor memory growth trends
- Plan for user base expansion
- Consider database sharding at scale
- Implement auto-scaling for Lambda functions
3. Backup & Recoveryβ
- Regular database backups
- Test grooming logic in staging
- Maintain rollback procedures
- Document emergency procedures
For Productβ
1. User Educationβ
- Explain tier system to power users
- Provide memory usage visibility
- Guide optimal memento creation
- Educate on search and filtering
2. Feature Developmentβ
- Consider memory impact of new features
- Design with grooming in mind
- Implement usage analytics
- Plan for user feedback integration
3. Scaling Considerationsβ
- Design for multi-tenant efficiency
- Consider per-user limit customization
- Plan for enterprise-scale deployments
- Implement usage-based pricing models
Future Enhancementsβ
Short Term (Next Quarter)β
1. Vector Similarity Searchβ
- Implement semantic memento retrieval
- Use embeddings for context relevance
- Improve search beyond text matching
- Enable AI-powered memento clustering
2. Advanced Groomingβ
- Machine learning-based importance scoring
- User behavior pattern recognition
- Predictive grooming triggers
- Personalized memory management
3. Performance Optimizationsβ
- Redis caching layer
- Database connection pooling
- Async processing pipeline
- CDN for static memento content
Medium Term (6 Months)β
1. Multi-Modal Mementosβ
- Image and file attachments
- Audio transcription support
- Video content summarization
- Rich media memory search
2. Collaborative Memoryβ
- Shared memento spaces
- Team memory pools
- Permission-based access
- Collaborative grooming
3. Advanced Analyticsβ
- Memory usage prediction
- Content quality scoring
- User behavior insights
- Performance optimization recommendations
Long Term (1 Year+)β
1. AI-Powered Memory Managementβ
- Fully autonomous grooming
- Predictive context assembly
- Intelligent tier management
- Self-optimizing memory systems
2. Federation & Synchronizationβ
- Cross-device memory sync
- Distributed memory architecture
- Edge computing integration
- Real-time collaboration
3. Enterprise Featuresβ
- Advanced compliance tools
- Audit and governance
- Bulk management operations
- Integration with enterprise systems
Conclusionβ
The Mementos System represents a sophisticated approach to AI memory management that balances immediate performance needs with long-term scalability. Through the implementation of critical production fixes, comprehensive monitoring, and intelligent automation, the system provides reliable, high-performance memory services that scale with user needs.
The multi-layered protection against context window overflow, combined with proactive memory management and emergency enforcement mechanisms, ensures system stability even under extreme load conditions. The three-tier memory hierarchy provides an optimal balance between accessibility and efficiency, while the grooming system maintains performance without user intervention.
This architecture serves as a foundation for advanced AI memory capabilities while maintaining the reliability and performance standards required for production deployment. The comprehensive monitoring and troubleshooting capabilities ensure that issues can be quickly identified and resolved, while the modular design allows for future enhancements without compromising existing functionality.
This documentation reflects the system state as of the critical production fixes implementation. For the latest updates and changes, refer to the system changelogs and monitoring dashboards.