Memory System Setup Guide¶
This guide covers the installation, configuration, and usage of the MCP Memory tool - a personal note-taking and search system for autonomous agents.
Terminology Note: - MCP Memory - The current note-taking tool for agents (what you're setting up here) - mem-agent - A planned future LLM-based memory assistant (not yet implemented)
The configuration still uses
MEM_AGENT_*
prefixes for historical reasons.
Overview¶
The MCP Memory tool is a local note-taking system specifically designed for the main agent. The agent uses it to:
- Record notes: Write down important information, findings, or context during task execution
- Search notes: Find and recall previously recorded information to "remember" details
- Maintain context: Keep working memory across multiple LLM calls within a single session
This is particularly useful for autonomous agents (like qwen code cli) that make many LLM calls within one continuous session.
Storage Types¶
The system supports two storage backends:
1. JSON Storage (Default)¶
- Simple and Fast: File-based JSON storage with substring search
- No Dependencies: No ML models or additional libraries required
- Lightweight: Minimal memory footprint
- Best for: Most users, small to medium memory sizes, simple search needs
2. Model-Based Storage¶
- AI-Powered: Semantic search using the
BAAI/bge-m3
model from HuggingFace - Smart Search: Understands meaning, not just keywords
- Best for: Large memory sizes, complex queries, semantic understanding needed
- Requires: Additional dependencies (transformers, sentence-transformers)
The storage type is configured via MEM_AGENT_STORAGE_TYPE
setting (default: json
).
Quick Start¶
Installation¶
Run the installation script:
This will: 1. Install all required dependencies 2. Download the mem-agent model from HuggingFace 3. Setup the memory directory structure 4. Create the MCP server configuration 5. Register mem-agent as an MCP server
Configuration¶
Enable mem-agent in your config.yaml
:
# Enable MCP support
AGENT_ENABLE_MCP: true
AGENT_ENABLE_MCP_MEMORY: true
# Memory agent settings
MEM_AGENT_STORAGE_TYPE: json # Storage type: "json" (default) or "vector"
MEM_AGENT_MODEL: BAAI/bge-m3 # Model for semantic search (if using "vector" storage)
MEM_AGENT_MODEL_PRECISION: 4bit
MEM_AGENT_BACKEND: auto
MEM_AGENT_MEMORY_POSTFIX: memory # Postfix within KB (kb_path/memory)
MEM_AGENT_MAX_TOOL_TURNS: 20
# MCP settings
MCP_SERVERS_POSTFIX: .mcp_servers # Per-user MCP servers (kb_path/.mcp_servers)
Or use environment variables in .env
:
AGENT_ENABLE_MCP=true
AGENT_ENABLE_MCP_MEMORY=true
MEM_AGENT_STORAGE_TYPE=json # or "vector" for semantic search
MEM_AGENT_MODEL=BAAI/bge-m3
MEM_AGENT_MODEL_PRECISION=4bit
MEM_AGENT_BACKEND=auto
MEM_AGENT_MEMORY_POSTFIX=memory
MCP_SERVERS_POSTFIX=.mcp_servers
Choosing Storage Type¶
Use JSON storage (default) if: - You want fast, lightweight storage - Simple keyword search is sufficient - You don't want to download ML models - You have limited resources
Use Model-based storage if: - You need semantic search (understands meaning) - You have large amounts of memories - You want AI-powered relevance ranking - You have resources for ML models
To enable model-based storage:
- Set
MEM_AGENT_STORAGE_TYPE: vector
in config - Install additional dependencies:
- The model will be downloaded automatically on first use
Verification¶
Test that mem-agent is installed correctly:
# Check if model is downloaded
huggingface-cli scan-cache | grep mem-agent
# Verify MCP server configuration exists
cat data/mcp_servers/mem-agent.json
# Check memory directory
ls -la knowledge_bases/default/memory/
Advanced Installation¶
Custom Model Location¶
python scripts/install_mem_agent.py \
--model BAAI/bge-m3 \
--precision 8bit \
--workspace /path/to/workspace
Skip Model Download¶
If you've already downloaded the model:
Platform-Specific Backends¶
Linux with GPU (vLLM)¶
# Install vLLM
pip install vllm
# Configure to use vLLM
export MEM_AGENT_BACKEND=vllm
export MEM_AGENT_VLLM_HOST=127.0.0.1
export MEM_AGENT_VLLM_PORT=8001
macOS with Apple Silicon (MLX)¶
# Install MLX
pip install mlx mlx-lm
# Configure to use MLX
export MEM_AGENT_BACKEND=mlx
export MEM_AGENT_MODEL_PRECISION=4bit
CPU Fallback (Transformers)¶
Memory Structure¶
The memory agent uses an Obsidian-like file structure stored per-user within each knowledge base:
knowledge_bases/
└── {user_kb_name}/ # Each user has their own KB
├── .mcp_servers/ # Per-user MCP server configs
│ └── mem-agent.json
├── memory/ # Per-user memory (postfix configurable)
│ ├── user.md # Personal information
│ └── entities/ # Entity files
│ ├── person_name.md
│ ├── company_name.md
│ └── place_name.md
└── topics/ # User's notes
Key Points:
- Memory path is constructed as: {kb_path}/{MEM_AGENT_MEMORY_POSTFIX}
- MCP servers are stored at: {kb_path}/{MCP_SERVERS_POSTFIX}
- Each user gets their own isolated memory and MCP configuration
user.md Structure¶
# User Information
- user_name: John Doe
- user_age: 30
- user_location: San Francisco, CA
## User Relationships
- employer: [[entities/acme_corp.md]]
- spouse: [[entities/jane_doe.md]]
## Preferences
- favorite_color: blue
- favorite_food: pizza
Entity File Structure¶
# Acme Corporation
- entity_type: Company
- industry: Technology
- location: San Francisco, CA
- founded: 2010
## Employees
- ceo: [[entities/john_smith.md]]
Usage¶
Through Agent¶
The agent automatically uses mem-agent when enabled to record notes and search them:
from src.agents import AgentFactory
from config.settings import settings
# Create agent with mem-agent enabled
agent = AgentFactory.from_settings(settings)
# The agent can record notes during task execution
# For example, during a complex task:
result = await agent.process({
"text": "Analyze this codebase and suggest improvements"
})
# The agent internally records findings like:
# - "Found authentication vulnerability in login.py"
# - "Database queries missing indexes in user_service.py"
# - "Found 15 TODO comments that need attention"
# Later in the same session, the agent can search its notes:
# When asked about specific findings, the agent searches:
# "What security issues did I find?"
# And retrieves the authentication vulnerability note
Direct API (Advanced)¶
from config.settings import settings
from pathlib import Path
# Memory agent settings are now part of the main settings module
# Construct paths based on user's KB:
kb_path = Path("./knowledge_bases/user_kb_name") # Get from user settings
print(f"Model: {settings.MEM_AGENT_MODEL}")
print(f"Memory postfix: {settings.MEM_AGENT_MEMORY_POSTFIX}")
print(f"Full memory path: {settings.get_mem_agent_memory_path(kb_path)}")
print(f"MCP servers dir: {settings.get_mcp_servers_dir(kb_path)}")
print(f"Backend: {settings.get_mem_agent_backend()}")
# The MemoryAgent and MemoryAgentMCPServer classes are planned for future development
Model Selection¶
Available Models¶
- BAAI/bge-m3 (default) - High-quality multilingual embedding model
- Any sentence-transformers compatible model can be used
Changing Models¶
- Update configuration:
- Download new model:
- Restart the application
Model Download Management¶
Models are cached in HuggingFace cache directory:
# Check downloaded models
huggingface-cli scan-cache
# Delete specific model
huggingface-cli delete-cache --repo BAAI/bge-m3
# Clear entire cache
huggingface-cli delete-cache
Configuration Options¶
Settings Reference¶
Setting | Default | Description |
---|---|---|
MEM_AGENT_STORAGE_TYPE |
json |
Storage type: json (simple) or vector (AI-powered) |
MEM_AGENT_MODEL |
BAAI/bge-m3 |
HuggingFace model ID (for vector storage type) |
MEM_AGENT_MODEL_PRECISION |
4bit |
Model precision (4bit, 8bit, fp16) |
MEM_AGENT_BACKEND |
auto |
Backend (auto, vllm, mlx, transformers) |
MEM_AGENT_MEMORY_POSTFIX |
memory |
Memory directory postfix within KB |
MEM_AGENT_MAX_TOOL_TURNS |
20 |
Max tool execution iterations |
MEM_AGENT_TIMEOUT |
20 |
Timeout for code execution (seconds) |
MEM_AGENT_VLLM_HOST |
127.0.0.1 |
vLLM server host |
MEM_AGENT_VLLM_PORT |
8001 |
vLLM server port |
MEM_AGENT_FILE_SIZE_LIMIT |
1048576 |
Max file size (1MB) |
MEM_AGENT_DIR_SIZE_LIMIT |
10485760 |
Max directory size (10MB) |
MEM_AGENT_MEMORY_SIZE_LIMIT |
104857600 |
Max total memory (100MB) |
MCP_SERVERS_POSTFIX |
.mcp_servers |
MCP servers directory postfix within KB |
Storage Type Comparison¶
Feature | JSON Storage | Model-Based Storage |
---|---|---|
Search Type | Substring match | Semantic similarity |
Speed | Very fast | Moderate (first query slower) |
Memory Usage | Minimal | Higher (model in memory) |
Dependencies | None | transformers, sentence-transformers |
Model Download | Not required | Required (~400MB) |
Best Use Case | Simple searches | Complex semantic queries |
Example Query | "vulnerability" finds "vulnerability" | "security issue" finds "vulnerability" |
Backend Selection Logic¶
The auto
backend automatically selects the best available option:
- macOS: MLX if available, else Transformers
- Linux: vLLM if available, else Transformers
- Windows: Transformers
Performance Tuning¶
GPU Acceleration (vLLM)¶
For best performance on Linux with GPU:
Adjust vLLM parameters:
# Note: vLLM is for LLM inference, not for embeddings
# For embeddings, the model is loaded directly via sentence-transformers
# vllm serve BAAI/bge-m3 \
# --host 127.0.0.1 \
# --port 8001 \
--tensor-parallel-size 1
Memory Limits¶
Adjust memory size limits based on your use case:
# For power users with lots of memories
MEM_AGENT_FILE_SIZE_LIMIT: 5242880 # 5MB per file
MEM_AGENT_DIR_SIZE_LIMIT: 52428800 # 50MB per directory
MEM_AGENT_MEMORY_SIZE_LIMIT: 524288000 # 500MB total
Response Time¶
Reduce max tool turns for faster responses:
Troubleshooting¶
Model Download Issues¶
Problem: Model download fails or is very slow
Solutions: 1. Check internet connection 2. Try using a HuggingFace mirror:
3. Download manually:Backend Issues¶
Problem: Backend initialization fails
Solutions:
-
For vLLM errors:
-
For MLX errors:
-
Fallback to transformers:
Memory Path Issues¶
Problem: Memory files not being created
Solutions:
-
Check permissions:
-
Verify path in configuration:
-
Create manually:
MCP Server Connection Issues¶
Problem: Agent can't connect to mem-agent MCP server
Solutions:
- Verify server configuration follows standard MCP format:
Should contain:
{
"mcpServers": {
"mem-agent": {
"url": "http://127.0.0.1:8765/sse",
"timeout": 10000,
"trust": true,
"description": "..."
}
}
}
See MCP Configuration Format for details.
-
Verify HTTP server is running:
-
Test server manually:
-
Test SSE endpoint:
Best Practices¶
Memory Organization¶
- Use descriptive entity names:
jane_doe.md
, notperson1.md
- Link entities: Use
[[entities/name.md]]
for relationships - Keep files focused: One entity per file
- Update regularly: Memory agent will update files as it learns
Model Selection¶
- Start with 4-bit: Good balance of size and performance
- Upgrade to 8-bit: If you have more RAM and want better quality
- Use fp16: Only on GPU with plenty of VRAM
Security¶
- Review memories: Periodically check
knowledge_bases/{user_kb}/memory/
for sensitive info - Set size limits: Prevent memory from growing too large
- Backup regularly: Memory files are plain text, easy to backup
- Per-user isolation: Each user has isolated memory and MCP configs in their KB
- Knowledge base integration: Memory is stored within your knowledge base structure
See Also¶
- MCP Server Registry - Managing MCP servers
- MCP Tools - Using MCP tools in agents
- Configuration - Complete configuration reference