Memory System Setup Guide¶

This guide covers the installation, configuration, and usage of the MCP Memory tool - a personal note-taking and search system for autonomous agents.

Terminology Note: - MCP Memory - The current note-taking tool for agents (what you're setting up here) - mem-agent - A planned future LLM-based memory assistant (not yet implemented)

The configuration still uses MEM_AGENT_* prefixes for historical reasons.

Overview¶

The MCP Memory tool is a local note-taking system specifically designed for the main agent. The agent uses it to:

Record notes: Write down important information, findings, or context during task execution
Search notes: Find and recall previously recorded information to "remember" details
Maintain context: Keep working memory across multiple LLM calls within a single session

This is particularly useful for autonomous agents (like qwen code cli) that make many LLM calls within one continuous session.

Storage Types¶

The system supports two storage backends:

1. JSON Storage (Default)¶

Simple and Fast: File-based JSON storage with substring search
No Dependencies: No ML models or additional libraries required
Lightweight: Minimal memory footprint
Best for: Most users, small to medium memory sizes, simple search needs

2. Model-Based Storage¶

AI-Powered: Semantic search using the BAAI/bge-m3 model from HuggingFace
Smart Search: Understands meaning, not just keywords
Best for: Large memory sizes, complex queries, semantic understanding needed
Requires: Additional dependencies (transformers, sentence-transformers)

The storage type is configured via MEM_AGENT_STORAGE_TYPE setting (default: json).

Quick Start¶

Installation¶

Run the installation script:

python scripts/install_mem_agent.py

This will: 1. Install all required dependencies 2. Download the mem-agent model from HuggingFace 3. Setup the memory directory structure 4. Create the MCP server configuration 5. Register mem-agent as an MCP server

Configuration¶

Enable mem-agent in your config.yaml:

# Enable MCP support
AGENT_ENABLE_MCP: true
AGENT_ENABLE_MCP_MEMORY: true

# Memory agent settings
MEM_AGENT_STORAGE_TYPE: json  # Storage type: "json" (default) or "vector"
MEM_AGENT_MODEL: BAAI/bge-m3  # Model for semantic search (if using "vector" storage)
MEM_AGENT_MODEL_PRECISION: 4bit
MEM_AGENT_BACKEND: auto
MEM_AGENT_MEMORY_POSTFIX: memory  # Postfix within KB (kb_path/memory)
MEM_AGENT_MAX_TOOL_TURNS: 20

# MCP settings
MCP_SERVERS_POSTFIX: .mcp_servers  # Per-user MCP servers (kb_path/.mcp_servers)

Or use environment variables in .env:

AGENT_ENABLE_MCP=true
AGENT_ENABLE_MCP_MEMORY=true
MEM_AGENT_STORAGE_TYPE=json  # or "vector" for semantic search
MEM_AGENT_MODEL=BAAI/bge-m3
MEM_AGENT_MODEL_PRECISION=4bit
MEM_AGENT_BACKEND=auto
MEM_AGENT_MEMORY_POSTFIX=memory
MCP_SERVERS_POSTFIX=.mcp_servers

Choosing Storage Type¶

Use JSON storage (default) if: - You want fast, lightweight storage - Simple keyword search is sufficient - You don't want to download ML models - You have limited resources

Use Model-based storage if: - You need semantic search (understands meaning) - You have large amounts of memories - You want AI-powered relevance ranking - You have resources for ML models

To enable model-based storage:

Set MEM_AGENT_STORAGE_TYPE: vector in config

Install additional dependencies:

pip install sentence-transformers transformers torch

The model will be downloaded automatically on first use

Verification¶

Test that mem-agent is installed correctly:

# Check if model is downloaded
huggingface-cli scan-cache | grep mem-agent

# Verify MCP server configuration exists
cat data/mcp_servers/mem-agent.json

# Check memory directory
ls -la knowledge_bases/default/memory/

Advanced Installation¶

Custom Model Location¶

python scripts/install_mem_agent.py \
  --model BAAI/bge-m3 \
  --precision 8bit \
  --workspace /path/to/workspace

Skip Model Download¶

If you've already downloaded the model:

python scripts/install_mem_agent.py --skip-model-download

Platform-Specific Backends¶

Linux with GPU (vLLM)¶

# Install vLLM
pip install vllm

# Configure to use vLLM
export MEM_AGENT_BACKEND=vllm
export MEM_AGENT_VLLM_HOST=127.0.0.1
export MEM_AGENT_VLLM_PORT=8001

macOS with Apple Silicon (MLX)¶

# Install MLX
pip install mlx mlx-lm

# Configure to use MLX
export MEM_AGENT_BACKEND=mlx
export MEM_AGENT_MODEL_PRECISION=4bit

CPU Fallback (Transformers)¶

# Already installed with base dependencies
export MEM_AGENT_BACKEND=transformers

Memory Structure¶

The memory agent uses an Obsidian-like file structure stored per-user within each knowledge base:

knowledge_bases/
└── {user_kb_name}/       # Each user has their own KB
    ├── .mcp_servers/     # Per-user MCP server configs
    │   └── mem-agent.json
    ├── memory/           # Per-user memory (postfix configurable)
    │   ├── user.md       # Personal information
    │   └── entities/     # Entity files
    │       ├── person_name.md
    │       ├── company_name.md
    │       └── place_name.md
    └── topics/           # User's notes

Key Points: - Memory path is constructed as: {kb_path}/{MEM_AGENT_MEMORY_POSTFIX} - MCP servers are stored at: {kb_path}/{MCP_SERVERS_POSTFIX} - Each user gets their own isolated memory and MCP configuration

user.md Structure¶

# User Information
- user_name: John Doe
- user_age: 30
- user_location: San Francisco, CA

## User Relationships
- employer: [[entities/acme_corp.md]]
- spouse: [[entities/jane_doe.md]]

## Preferences
- favorite_color: blue
- favorite_food: pizza

Entity File Structure¶

# Acme Corporation
- entity_type: Company
- industry: Technology
- location: San Francisco, CA
- founded: 2010

## Employees
- ceo: [[entities/john_smith.md]]

Usage¶

Through Agent¶

The agent automatically uses mem-agent when enabled to record notes and search them:

from src.agents import AgentFactory
from config.settings import settings

# Create agent with mem-agent enabled
agent = AgentFactory.from_settings(settings)

# The agent can record notes during task execution
# For example, during a complex task:
result = await agent.process({
    "text": "Analyze this codebase and suggest improvements"
})
# The agent internally records findings like:
# - "Found authentication vulnerability in login.py"
# - "Database queries missing indexes in user_service.py"
# - "Found 15 TODO comments that need attention"

# Later in the same session, the agent can search its notes:
# When asked about specific findings, the agent searches:
# "What security issues did I find?"
# And retrieves the authentication vulnerability note

Direct API (Advanced)¶

from config.settings import settings
from pathlib import Path

# Memory agent settings are now part of the main settings module
# Construct paths based on user's KB:
kb_path = Path("./knowledge_bases/user_kb_name")  # Get from user settings

print(f"Model: {settings.MEM_AGENT_MODEL}")
print(f"Memory postfix: {settings.MEM_AGENT_MEMORY_POSTFIX}")
print(f"Full memory path: {settings.get_mem_agent_memory_path(kb_path)}")
print(f"MCP servers dir: {settings.get_mcp_servers_dir(kb_path)}")
print(f"Backend: {settings.get_mem_agent_backend()}")

# The MemoryAgent and MemoryAgentMCPServer classes are planned for future development

Model Selection¶

Available Models¶

BAAI/bge-m3 (default) - High-quality multilingual embedding model
Any sentence-transformers compatible model can be used

Changing Models¶

Update configuration:

MEM_AGENT_MODEL: sentence-transformers/all-MiniLM-L6-v2
MEM_AGENT_MODEL_PRECISION: fp16

Download new model:

huggingface-cli download BAAI/bge-m3

Restart the application

Model Download Management¶

Models are cached in HuggingFace cache directory:

# Check downloaded models
huggingface-cli scan-cache

# Delete specific model
huggingface-cli delete-cache --repo BAAI/bge-m3

# Clear entire cache
huggingface-cli delete-cache

Configuration Options¶

Settings Reference¶

Setting	Default	Description
`MEM_AGENT_STORAGE_TYPE`	`json`	Storage type: `json` (simple) or `vector` (AI-powered)
`MEM_AGENT_MODEL`	`BAAI/bge-m3`	HuggingFace model ID (for `vector` storage type)
`MEM_AGENT_MODEL_PRECISION`	`4bit`	Model precision (4bit, 8bit, fp16)
`MEM_AGENT_BACKEND`	`auto`	Backend (auto, vllm, mlx, transformers)
`MEM_AGENT_MEMORY_POSTFIX`	`memory`	Memory directory postfix within KB
`MEM_AGENT_MAX_TOOL_TURNS`	`20`	Max tool execution iterations
`MEM_AGENT_TIMEOUT`	`20`	Timeout for code execution (seconds)
`MEM_AGENT_VLLM_HOST`	`127.0.0.1`	vLLM server host
`MEM_AGENT_VLLM_PORT`	`8001`	vLLM server port
`MEM_AGENT_FILE_SIZE_LIMIT`	`1048576`	Max file size (1MB)
`MEM_AGENT_DIR_SIZE_LIMIT`	`10485760`	Max directory size (10MB)
`MEM_AGENT_MEMORY_SIZE_LIMIT`	`104857600`	Max total memory (100MB)
`MCP_SERVERS_POSTFIX`	`.mcp_servers`	MCP servers directory postfix within KB

Storage Type Comparison¶

Feature	JSON Storage	Model-Based Storage
Search Type	Substring match	Semantic similarity
Speed	Very fast	Moderate (first query slower)
Memory Usage	Minimal	Higher (model in memory)
Dependencies	None	transformers, sentence-transformers
Model Download	Not required	Required (~400MB)
Best Use Case	Simple searches	Complex semantic queries
Example Query	"vulnerability" finds "vulnerability"	"security issue" finds "vulnerability"

Backend Selection Logic¶

The auto backend automatically selects the best available option:

macOS: MLX if available, else Transformers
Linux: vLLM if available, else Transformers
Windows: Transformers

Performance Tuning¶

GPU Acceleration (vLLM)¶

For best performance on Linux with GPU:

MEM_AGENT_BACKEND: vllm
MEM_AGENT_MODEL_PRECISION: fp16

Adjust vLLM parameters:

# Note: vLLM is for LLM inference, not for embeddings
# For embeddings, the model is loaded directly via sentence-transformers
# vllm serve BAAI/bge-m3 \
#   --host 127.0.0.1 \
#   --port 8001 \
  --tensor-parallel-size 1

Memory Limits¶

Adjust memory size limits based on your use case:

# For power users with lots of memories
MEM_AGENT_FILE_SIZE_LIMIT: 5242880      # 5MB per file
MEM_AGENT_DIR_SIZE_LIMIT: 52428800      # 50MB per directory  
MEM_AGENT_MEMORY_SIZE_LIMIT: 524288000  # 500MB total

Response Time¶

Reduce max tool turns for faster responses:

MEM_AGENT_MAX_TOOL_TURNS: 10  # Faster but less thorough

Troubleshooting¶

Model Download Issues¶

Problem: Model download fails or is very slow

Solutions: 1. Check internet connection 2. Try using a HuggingFace mirror:

export HF_ENDPOINT=https://hf-mirror.com

3. Download manually:

huggingface-cli download BAAI/bge-m3 --local-dir ./models/bge-m3

Backend Issues¶

Problem: Backend initialization fails

Solutions:

For vLLM errors:

# Ensure CUDA is available
python -c "import torch; print(torch.cuda.is_available())"

# Reinstall vLLM
pip uninstall vllm
pip install vllm --no-cache-dir

For MLX errors:

# Ensure on macOS with Apple Silicon
uname -m  # Should show arm64

# Reinstall MLX
pip uninstall mlx mlx-lm
pip install mlx mlx-lm

Fallback to transformers:
```
MEM_AGENT_BACKEND: transformers
```

Memory Path Issues¶

Problem: Memory files not being created

Solutions:

Check permissions:

# Replace {user_kb} with actual KB name
ls -la knowledge_bases/{user_kb}/memory/
chmod -R 755 knowledge_bases/{user_kb}/memory/

Verify path in configuration:

from config.settings import settings
from pathlib import Path

kb_path = Path("./knowledge_bases/user_kb")
print(f"Memory postfix: {settings.MEM_AGENT_MEMORY_POSTFIX}")
print(f"Full path: {settings.get_mem_agent_memory_path(kb_path)}")

Create manually:

# Replace {user_kb} with actual KB name
mkdir -p knowledge_bases/{user_kb}/memory/entities
touch knowledge_bases/{user_kb}/memory/user.md

MCP Server Connection Issues¶

Problem: Agent can't connect to mem-agent MCP server

Solutions:

Verify server configuration follows standard MCP format:
```
cat data/mcp_servers/mem-agent.json
```

Should contain:

{
  "mcpServers": {
    "mem-agent": {
      "url": "http://127.0.0.1:8765/sse",
      "timeout": 10000,
      "trust": true,
      "description": "..."
    }
  }
}

See MCP Configuration Format for details.

Verify HTTP server is running:

# Server should auto-start with bot
# Check logs for: "[MCPServerManager] ✓ Server 'mem-agent' started successfully"

Test server manually:

python -m src.agents.mcp.mem_agent_server_http --host 127.0.0.1 --port 8765

Test SSE endpoint:
```
curl http://127.0.0.1:8765/sse
```

Best Practices¶

Memory Organization¶

Use descriptive entity names: jane_doe.md, not person1.md
Link entities: Use [[entities/name.md]] for relationships
Keep files focused: One entity per file
Update regularly: Memory agent will update files as it learns

Model Selection¶

Start with 4-bit: Good balance of size and performance
Upgrade to 8-bit: If you have more RAM and want better quality
Use fp16: Only on GPU with plenty of VRAM

Security¶

Review memories: Periodically check knowledge_bases/{user_kb}/memory/ for sensitive info
Set size limits: Prevent memory from growing too large
Backup regularly: Memory files are plain text, easy to backup
Per-user isolation: Each user has isolated memory and MCP configs in their KB
Knowledge base integration: Memory is stored within your knowledge base structure

Memory System Setup Guide¶

Overview¶

Storage Types¶

1. JSON Storage (Default)¶

2. Model-Based Storage¶

Quick Start¶

Installation¶

Configuration¶

Choosing Storage Type¶

Verification¶

Advanced Installation¶

Custom Model Location¶

Skip Model Download¶

Platform-Specific Backends¶

Linux with GPU (vLLM)¶

macOS with Apple Silicon (MLX)¶

CPU Fallback (Transformers)¶

Memory Structure¶

user.md Structure¶

Entity File Structure¶

Usage¶

Through Agent¶

Direct API (Advanced)¶

Model Selection¶

Available Models¶

Changing Models¶

Model Download Management¶

Configuration Options¶

Settings Reference¶

Storage Type Comparison¶

Backend Selection Logic¶

Performance Tuning¶

GPU Acceleration (vLLM)¶

Memory Limits¶

Response Time¶

Troubleshooting¶

Model Download Issues¶

Backend Issues¶

Memory Path Issues¶

MCP Server Connection Issues¶

Best Practices¶

Memory Organization¶

Model Selection¶

Security¶

See Also¶