Vector Search Quick Start¶
Quick guide to set up semantic vector search in your knowledge base.
What This Gives You¶
- 🔍 Semantic Search: Search by meaning, not keywords
- 🚀 Automatic Indexing: Knowledge base is automatically vectorized
- 🎯 Accurate Results: Finds relevant documents even without exact word matches
- 🌐 Multi-language: Russian and English support
Architecture¶
Quick Start¶
1. Prepare Files¶
# Copy configuration examples
cp .env.example .env
cp config.example.yaml config.yaml
# Create data directories
mkdir -p data/qdrant_storage data/infinity_cache
2. Configure .env¶
Edit .env, minimum required:
# Your Telegram bot token
TELEGRAM_BOT_TOKEN=your_bot_token_here
# Allowed user IDs
ALLOWED_USER_IDS=123456789
# Embedding model (for Russian + English)
INFINITY_MODEL=BAAI/bge-m3
3. Configure config.yaml¶
Enable vector search in config.yaml:
# Enable vector search
VECTOR_SEARCH_ENABLED: true
# Embedding settings
VECTOR_EMBEDDING_PROVIDER: infinity
VECTOR_EMBEDDING_MODEL: BAAI/bge-m3
VECTOR_INFINITY_API_URL: http://infinity:7997
# Qdrant settings
VECTOR_STORE_PROVIDER: qdrant
VECTOR_QDRANT_URL: http://qdrant:6333
VECTOR_QDRANT_COLLECTION: knowledge_base
4. Start Services¶
# Start all services (includes Qdrant and Infinity)
# IMPORTANT: vLLM and SGLang both use port 8001 - comment out one of them in docker-compose.yml!
docker-compose up -d
# Watch logs (wait for model loading ~1-2 minutes)
docker-compose logs -f infinity
5. Verify Operation¶
# Check status of all services
docker-compose ps
# Should be running:
# - tg-note-qdrant (healthy)
# - tg-note-infinity (healthy)
# - tg-note-hub (healthy)
# - tg-note-bot (running)
Usage¶
Through Telegram Bot¶
- Send documents to the bot (text, PDF, DOCX, etc.)
- Bot automatically indexes them
- Ask questions - bot will use semantic search
Example¶
👤 User: How do transformers work in NLP?
🤖 Bot: [finds relevant sections in knowledge base using vector search]
Model Selection¶
For Russian + English (recommended)¶
English only (faster)¶
Other Options¶
| Model | Languages | Quality | Speed |
|---|---|---|---|
BAAI/bge-m3 |
Multilingual | ⭐⭐⭐⭐⭐ | ⚡⚡ |
BAAI/bge-small-en-v1.5 |
English | ⭐⭐⭐⭐ | ⚡⚡⚡ |
BAAI/bge-base-en-v1.5 |
English | ⭐⭐⭐⭐⭐ | ⚡⚡ |
Management¶
View Logs¶
Restart¶
Stop¶
Resource Requirements¶
Minimum Requirements¶
- RAM: 4 GB (with
bge-smallmodel) - Disk: 5 GB (for models and data)
- CPU: 2 cores
Recommended Requirements¶
- RAM: 8 GB (with
bge-m3model) - Disk: 10 GB
- CPU: 4 cores
- GPU (optional): Speeds up processing 5-10x
GPU Acceleration (Optional)¶
To speed up processing, uncomment in docker-compose.yml (infinity section):
Requirements: - NVIDIA GPU - nvidia-docker installed
Troubleshooting¶
Infinity Won't Start¶
Problem: Infinity container keeps restarting
Solution: Check logs and give time for model loading (1-2 minutes)
Out of Memory¶
Problem: Out of memory
Solution: Use smaller model
Vector Search Not Available¶
Problem: "Vector search is not available"
Solution: Check config.yaml
How It Works¶
- Bot receives document from user
- Bot calls
reindex_vectorthrough MCP Hub - MCP Hub sends text to Infinity to create embeddings
- MCP Hub saves vectors to Qdrant
- When searching, Bot calls
vector_search - MCP Hub gets query embedding from Infinity
- MCP Hub searches similar vectors in Qdrant
- Bot receives relevant results
Each user's knowledge base → separate collection in Qdrant.
Additional Documentation¶
Complete documentation: Docker Vector Search Setup
AICODE-NOTE¶
- Bot manages vectorization logic (when to index, when to search)
- MCP Hub provides tools (vector_search, reindex_vector)
- Infinity generates embeddings (converts text to vectors)
- Qdrant stores vectors (separate collection for each knowledge base)