Skip to content

Memory System Setup Guide

This guide covers the installation, configuration, and usage of the MCP Memory tool - a personal note-taking and search system for autonomous agents.

Terminology Note: - MCP Memory - The current note-taking tool for agents (what you're setting up here) - mem-agent - A planned future LLM-based memory assistant (not yet implemented)

The configuration still uses MEM_AGENT_* prefixes for historical reasons.

Overview

The MCP Memory tool is a local note-taking system specifically designed for the main agent. The agent uses it to:

  • Record notes: Write down important information, findings, or context during task execution
  • Search notes: Find and recall previously recorded information to "remember" details
  • Maintain context: Keep working memory across multiple LLM calls within a single session

This is particularly useful for autonomous agents (like qwen code cli) that make many LLM calls within one continuous session.

Storage Types

The system supports two storage backends:

1. JSON Storage (Default)

  • Simple and Fast: File-based JSON storage with substring search
  • No Dependencies: No ML models or additional libraries required
  • Lightweight: Minimal memory footprint
  • Best for: Most users, small to medium memory sizes, simple search needs

2. Model-Based Storage

  • AI-Powered: Semantic search using the BAAI/bge-m3 model from HuggingFace
  • Smart Search: Understands meaning, not just keywords
  • Best for: Large memory sizes, complex queries, semantic understanding needed
  • Requires: Additional dependencies (transformers, sentence-transformers)

The storage type is configured via MEM_AGENT_STORAGE_TYPE setting (default: json).

Quick Start

Installation

Run the installation script:

python scripts/install_mem_agent.py

This will: 1. Install all required dependencies 2. Download the mem-agent model from HuggingFace 3. Setup the memory directory structure 4. Create the MCP server configuration 5. Register mem-agent as an MCP server

Configuration

Enable mem-agent in your config.yaml:

# Enable MCP support
AGENT_ENABLE_MCP: true
AGENT_ENABLE_MCP_MEMORY: true

# Memory agent settings
MEM_AGENT_STORAGE_TYPE: json  # Storage type: "json" (default) or "vector"
MEM_AGENT_MODEL: BAAI/bge-m3  # Model for semantic search (if using "vector" storage)
MEM_AGENT_MODEL_PRECISION: 4bit
MEM_AGENT_BACKEND: auto
MEM_AGENT_MEMORY_POSTFIX: memory  # Postfix within KB (kb_path/memory)
MEM_AGENT_MAX_TOOL_TURNS: 20

# MCP settings
MCP_SERVERS_POSTFIX: .mcp_servers  # Per-user MCP servers (kb_path/.mcp_servers)

Or use environment variables in .env:

AGENT_ENABLE_MCP=true
AGENT_ENABLE_MCP_MEMORY=true
MEM_AGENT_STORAGE_TYPE=json  # or "vector" for semantic search
MEM_AGENT_MODEL=BAAI/bge-m3
MEM_AGENT_MODEL_PRECISION=4bit
MEM_AGENT_BACKEND=auto
MEM_AGENT_MEMORY_POSTFIX=memory
MCP_SERVERS_POSTFIX=.mcp_servers

Choosing Storage Type

Use JSON storage (default) if: - You want fast, lightweight storage - Simple keyword search is sufficient - You don't want to download ML models - You have limited resources

Use Model-based storage if: - You need semantic search (understands meaning) - You have large amounts of memories - You want AI-powered relevance ranking - You have resources for ML models

To enable model-based storage:

  1. Set MEM_AGENT_STORAGE_TYPE: vector in config
  2. Install additional dependencies:
    pip install sentence-transformers transformers torch
    
  3. The model will be downloaded automatically on first use

Verification

Test that mem-agent is installed correctly:

# Check if model is downloaded
huggingface-cli scan-cache | grep mem-agent

# Verify MCP server configuration exists
cat data/mcp_servers/mem-agent.json

# Check memory directory
ls -la knowledge_bases/default/memory/

Advanced Installation

Custom Model Location

python scripts/install_mem_agent.py \
  --model BAAI/bge-m3 \
  --precision 8bit \
  --workspace /path/to/workspace

Skip Model Download

If you've already downloaded the model:

python scripts/install_mem_agent.py --skip-model-download

Platform-Specific Backends

Linux with GPU (vLLM)

# Install vLLM
pip install vllm

# Configure to use vLLM
export MEM_AGENT_BACKEND=vllm
export MEM_AGENT_VLLM_HOST=127.0.0.1
export MEM_AGENT_VLLM_PORT=8001

macOS with Apple Silicon (MLX)

# Install MLX
pip install mlx mlx-lm

# Configure to use MLX
export MEM_AGENT_BACKEND=mlx
export MEM_AGENT_MODEL_PRECISION=4bit

CPU Fallback (Transformers)

# Already installed with base dependencies
export MEM_AGENT_BACKEND=transformers

Memory Structure

The memory agent uses an Obsidian-like file structure stored per-user within each knowledge base:

knowledge_bases/
└── {user_kb_name}/       # Each user has their own KB
    ├── .mcp_servers/     # Per-user MCP server configs
    │   └── mem-agent.json
    ├── memory/           # Per-user memory (postfix configurable)
    │   ├── user.md       # Personal information
    │   └── entities/     # Entity files
    │       ├── person_name.md
    │       ├── company_name.md
    │       └── place_name.md
    └── topics/           # User's notes

Key Points: - Memory path is constructed as: {kb_path}/{MEM_AGENT_MEMORY_POSTFIX} - MCP servers are stored at: {kb_path}/{MCP_SERVERS_POSTFIX} - Each user gets their own isolated memory and MCP configuration

user.md Structure

# User Information
- user_name: John Doe
- user_age: 30
- user_location: San Francisco, CA

## User Relationships
- employer: [[entities/acme_corp.md]]
- spouse: [[entities/jane_doe.md]]

## Preferences
- favorite_color: blue
- favorite_food: pizza

Entity File Structure

# Acme Corporation
- entity_type: Company
- industry: Technology
- location: San Francisco, CA
- founded: 2010

## Employees
- ceo: [[entities/john_smith.md]]

Usage

Through Agent

The agent automatically uses mem-agent when enabled to record notes and search them:

from src.agents import AgentFactory
from config.settings import settings

# Create agent with mem-agent enabled
agent = AgentFactory.from_settings(settings)

# The agent can record notes during task execution
# For example, during a complex task:
result = await agent.process({
    "text": "Analyze this codebase and suggest improvements"
})
# The agent internally records findings like:
# - "Found authentication vulnerability in login.py"
# - "Database queries missing indexes in user_service.py"
# - "Found 15 TODO comments that need attention"

# Later in the same session, the agent can search its notes:
# When asked about specific findings, the agent searches:
# "What security issues did I find?"
# And retrieves the authentication vulnerability note

Direct API (Advanced)

from config.settings import settings
from pathlib import Path

# Memory agent settings are now part of the main settings module
# Construct paths based on user's KB:
kb_path = Path("./knowledge_bases/user_kb_name")  # Get from user settings

print(f"Model: {settings.MEM_AGENT_MODEL}")
print(f"Memory postfix: {settings.MEM_AGENT_MEMORY_POSTFIX}")
print(f"Full memory path: {settings.get_mem_agent_memory_path(kb_path)}")
print(f"MCP servers dir: {settings.get_mcp_servers_dir(kb_path)}")
print(f"Backend: {settings.get_mem_agent_backend()}")

# The MemoryAgent and MemoryAgentMCPServer classes are planned for future development

Model Selection

Available Models

  • BAAI/bge-m3 (default) - High-quality multilingual embedding model
  • Any sentence-transformers compatible model can be used

Changing Models

  1. Update configuration:
MEM_AGENT_MODEL: sentence-transformers/all-MiniLM-L6-v2
MEM_AGENT_MODEL_PRECISION: fp16
  1. Download new model:
huggingface-cli download BAAI/bge-m3
  1. Restart the application

Model Download Management

Models are cached in HuggingFace cache directory:

# Check downloaded models
huggingface-cli scan-cache

# Delete specific model
huggingface-cli delete-cache --repo BAAI/bge-m3

# Clear entire cache
huggingface-cli delete-cache

Configuration Options

Settings Reference

Setting Default Description
MEM_AGENT_STORAGE_TYPE json Storage type: json (simple) or vector (AI-powered)
MEM_AGENT_MODEL BAAI/bge-m3 HuggingFace model ID (for vector storage type)
MEM_AGENT_MODEL_PRECISION 4bit Model precision (4bit, 8bit, fp16)
MEM_AGENT_BACKEND auto Backend (auto, vllm, mlx, transformers)
MEM_AGENT_MEMORY_POSTFIX memory Memory directory postfix within KB
MEM_AGENT_MAX_TOOL_TURNS 20 Max tool execution iterations
MEM_AGENT_TIMEOUT 20 Timeout for code execution (seconds)
MEM_AGENT_VLLM_HOST 127.0.0.1 vLLM server host
MEM_AGENT_VLLM_PORT 8001 vLLM server port
MEM_AGENT_FILE_SIZE_LIMIT 1048576 Max file size (1MB)
MEM_AGENT_DIR_SIZE_LIMIT 10485760 Max directory size (10MB)
MEM_AGENT_MEMORY_SIZE_LIMIT 104857600 Max total memory (100MB)
MCP_SERVERS_POSTFIX .mcp_servers MCP servers directory postfix within KB

Storage Type Comparison

Feature JSON Storage Model-Based Storage
Search Type Substring match Semantic similarity
Speed Very fast Moderate (first query slower)
Memory Usage Minimal Higher (model in memory)
Dependencies None transformers, sentence-transformers
Model Download Not required Required (~400MB)
Best Use Case Simple searches Complex semantic queries
Example Query "vulnerability" finds "vulnerability" "security issue" finds "vulnerability"

Backend Selection Logic

The auto backend automatically selects the best available option:

  1. macOS: MLX if available, else Transformers
  2. Linux: vLLM if available, else Transformers
  3. Windows: Transformers

Performance Tuning

GPU Acceleration (vLLM)

For best performance on Linux with GPU:

MEM_AGENT_BACKEND: vllm
MEM_AGENT_MODEL_PRECISION: fp16

Adjust vLLM parameters:

# Note: vLLM is for LLM inference, not for embeddings
# For embeddings, the model is loaded directly via sentence-transformers
# vllm serve BAAI/bge-m3 \
#   --host 127.0.0.1 \
#   --port 8001 \
  --tensor-parallel-size 1

Memory Limits

Adjust memory size limits based on your use case:

# For power users with lots of memories
MEM_AGENT_FILE_SIZE_LIMIT: 5242880      # 5MB per file
MEM_AGENT_DIR_SIZE_LIMIT: 52428800      # 50MB per directory  
MEM_AGENT_MEMORY_SIZE_LIMIT: 524288000  # 500MB total

Response Time

Reduce max tool turns for faster responses:

MEM_AGENT_MAX_TOOL_TURNS: 10  # Faster but less thorough

Troubleshooting

Model Download Issues

Problem: Model download fails or is very slow

Solutions: 1. Check internet connection 2. Try using a HuggingFace mirror:

export HF_ENDPOINT=https://hf-mirror.com
3. Download manually:
huggingface-cli download BAAI/bge-m3 --local-dir ./models/bge-m3

Backend Issues

Problem: Backend initialization fails

Solutions:

  1. For vLLM errors:

    # Ensure CUDA is available
    python -c "import torch; print(torch.cuda.is_available())"
    
    # Reinstall vLLM
    pip uninstall vllm
    pip install vllm --no-cache-dir
    

  2. For MLX errors:

    # Ensure on macOS with Apple Silicon
    uname -m  # Should show arm64
    
    # Reinstall MLX
    pip uninstall mlx mlx-lm
    pip install mlx mlx-lm
    

  3. Fallback to transformers:

    MEM_AGENT_BACKEND: transformers
    

Memory Path Issues

Problem: Memory files not being created

Solutions:

  1. Check permissions:

    # Replace {user_kb} with actual KB name
    ls -la knowledge_bases/{user_kb}/memory/
    chmod -R 755 knowledge_bases/{user_kb}/memory/
    

  2. Verify path in configuration:

    from config.settings import settings
    from pathlib import Path
    
    kb_path = Path("./knowledge_bases/user_kb")
    print(f"Memory postfix: {settings.MEM_AGENT_MEMORY_POSTFIX}")
    print(f"Full path: {settings.get_mem_agent_memory_path(kb_path)}")
    

  3. Create manually:

    # Replace {user_kb} with actual KB name
    mkdir -p knowledge_bases/{user_kb}/memory/entities
    touch knowledge_bases/{user_kb}/memory/user.md
    

MCP Server Connection Issues

Problem: Agent can't connect to mem-agent MCP server

Solutions:

  1. Verify server configuration follows standard MCP format:
    cat data/mcp_servers/mem-agent.json
    

Should contain:

{
  "mcpServers": {
    "mem-agent": {
      "url": "http://127.0.0.1:8765/sse",
      "timeout": 10000,
      "trust": true,
      "description": "..."
    }
  }
}

See MCP Configuration Format for details.

  1. Verify HTTP server is running:

    # Server should auto-start with bot
    # Check logs for: "[MCPServerManager] ✓ Server 'mem-agent' started successfully"
    

  2. Test server manually:

    python -m src.agents.mcp.mem_agent_server_http --host 127.0.0.1 --port 8765
    

  3. Test SSE endpoint:

    curl http://127.0.0.1:8765/sse
    

Best Practices

Memory Organization

  1. Use descriptive entity names: jane_doe.md, not person1.md
  2. Link entities: Use [[entities/name.md]] for relationships
  3. Keep files focused: One entity per file
  4. Update regularly: Memory agent will update files as it learns

Model Selection

  1. Start with 4-bit: Good balance of size and performance
  2. Upgrade to 8-bit: If you have more RAM and want better quality
  3. Use fp16: Only on GPU with plenty of VRAM

Security

  1. Review memories: Periodically check knowledge_bases/{user_kb}/memory/ for sensitive info
  2. Set size limits: Prevent memory from growing too large
  3. Backup regularly: Memory files are plain text, easy to backup
  4. Per-user isolation: Each user has isolated memory and MCP configs in their KB
  5. Knowledge base integration: Memory is stored within your knowledge base structure

See Also