Autonomous Agent Guide¶

Python-based autonomous agent with OpenAI-compatible API support for intelligent knowledge base building.

Overview¶

The Autonomous Agent is a Python-based AI agent that uses OpenAI-compatible APIs to process messages and build your knowledge base. It features autonomous planning, decision-making, and tool execution capabilities.

Unlike the Qwen Code CLI agent (which uses Node.js), the autonomous agent is pure Python and works with any OpenAI-compatible API endpoint, making it flexible for custom LLM providers.

Key Features¶

🤖 Autonomous Decision Making¶

Self-Planning - Creates execution plans autonomously
Dynamic Tool Selection - Chooses appropriate tools based on context
Iterative Refinement - Adapts strategy based on results
Error Recovery - Handles failures gracefully

🔧 Built-in Tools¶

Web Search - Search the internet for additional context
Git Operations - Version control for knowledge base
GitHub API - Repository management
File Management - Create, read, update files in KB
Folder Management - Organize KB structure
Shell Commands - Execute safe shell operations (optional)

🔌 LLM Flexibility¶

OpenAI - Native support for GPT models
Custom Endpoints - Use any OpenAI-compatible API
Qwen - Works with Qwen models via compatible API
Azure OpenAI - Enterprise deployment support
Local Models - Compatible with local LLM servers

🎯 Function Calling¶

Native function calling support
Automatic tool discovery
Structured output generation
Type-safe parameter passing

Installation¶

Prerequisites¶

Python 3.11+
Poetry (recommended) or pip
OpenAI API key or compatible API endpoint

Setup¶

Install dependencies (OpenAI library included):

poetry install

Configure environment variables:

# .env file
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional, for custom endpoints

Update config.yaml:

AGENT_TYPE: "autonomous"
AGENT_MODEL: "gpt-3.5-turbo"
AGENT_TIMEOUT: 300
AGENT_ENABLE_WEB_SEARCH: true
AGENT_ENABLE_GIT: true
AGENT_ENABLE_FILE_MANAGEMENT: true

Configuration¶

Basic Configuration¶

# config.yaml
AGENT_TYPE: "autonomous"
AGENT_MODEL: "gpt-3.5-turbo"  # or gpt-4, qwen-max, etc.
AGENT_TIMEOUT: 300  # seconds

Tool Permissions¶

# Enable/disable specific tools
AGENT_ENABLE_WEB_SEARCH: true
AGENT_ENABLE_GIT: true
AGENT_ENABLE_GITHUB: true
AGENT_ENABLE_SHELL: false  # Disabled by default for security
AGENT_ENABLE_FILE_MANAGEMENT: true
AGENT_ENABLE_FOLDER_MANAGEMENT: true

Advanced Settings¶

# Custom instruction for the agent
AGENT_INSTRUCTION: |
  You are a knowledge base curator.
  Analyze content, categorize it, and create well-structured Markdown files.

# Maximum planning iterations
AGENT_MAX_ITERATIONS: 10

# Knowledge base path
KB_PATH: ./knowledge_base

Environment Variables¶

# OpenAI Configuration
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1

# Alternative: Use Qwen API
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1

# GitHub (optional, for GitHub tools)
GITHUB_TOKEN=ghp_...

Usage Examples¶

Using with OpenAI¶

# config.yaml
AGENT_TYPE: "autonomous"
AGENT_MODEL: "gpt-3.5-turbo"

# .env
OPENAI_API_KEY=sk-proj-...

Using with Qwen API¶

# config.yaml
AGENT_TYPE: "autonomous"
AGENT_MODEL: "qwen-max"

# .env
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1

Using with Azure OpenAI¶

# config.yaml
AGENT_TYPE: "autonomous"
AGENT_MODEL: "gpt-35-turbo"

# .env
OPENAI_API_KEY=your-azure-key
OPENAI_BASE_URL=https://your-resource.openai.azure.com/

Using with Local LLM¶

# config.yaml
AGENT_TYPE: "autonomous"
AGENT_MODEL: "local-model"

# .env
OPENAI_API_KEY=not-needed
OPENAI_BASE_URL=http://localhost:8080/v1

How It Works¶

Processing Pipeline¶

graph LR
    A[Message] --> B[Autonomous Agent]
    B --> C{Plan}
    C --> D[Select Tool]
    D --> E[Execute]
    E --> F{Success?}
    F -->|Yes| G[Next Step]
    F -->|No| H[Adapt]
    H --> C
    G --> I{Done?}
    I -->|No| C
    I -->|Yes| J[Generate KB Entry]
    J --> K[Commit to Git]

    style B fill:#fff3e0
    style J fill:#e8f5e9

Execution Flow¶

Receive Content
Text, URLs, forwarded messages
Extract and parse information
Autonomous Planning
LLM creates execution plan
Decides which tools to use
Plans iterative steps
Tool Execution
Web search for context
Analyze content structure
Create folders/files
Git operations
Iterative Refinement
Evaluate results
Adjust strategy if needed
Handle errors gracefully
KB Generation
Create Markdown file
Add metadata
Proper categorization
Version Control
Git commit
Push to remote (if enabled)

Available Tools¶

Web Search Tool¶

Search the internet for additional context.

# Automatically used when agent needs more information
tool: web_search
params:
  query: "machine learning basics"

File Management Tools¶

Create and manage files in the knowledge base.

# Create new file
tool: file_create
params:
  path: "topics/ai/neural-networks.md"
  content: "# Neural Networks\n\n..."

Folder Management Tools¶

Organize knowledge base structure.

# Create folder
tool: folder_create
params:
  path: "topics/new-category"

Git Tools¶

Version control operations.

# Safe git commands only
tool: git_command
params:
  command: "git status"

GitHub Tools¶

Repository management (requires GITHUB_TOKEN).

# GitHub operations
tool: github_api
params:
  endpoint: "/repos/:owner/:repo/issues"
  method: "GET"

Performance¶

Processing Time¶

Content Type	Typical Time	With Web Search
Short text (< 500 chars)	5-10s	10-20s
Medium text (500-2000)	10-30s	20-60s
Long text (> 2000)	30-90s	60-180s
With URLs	15-45s	30-120s

Factors Affecting Speed¶

Model Selection - GPT-4 slower but higher quality
Tool Usage - Web search adds latency
Content Complexity - More analysis = longer time
API Latency - Network and API response time
Iterations - Complex tasks need more planning steps

Best Practices¶

Model Selection¶

GPT-3.5-turbo - ✅ Fast response time - ✅ Cost-effective - ✅ Good for simple content - ❌ Less sophisticated analysis

GPT-4 - ✅ Best quality - ✅ Complex reasoning - ✅ Better categorization - ❌ Slower and more expensive

Qwen-max - ✅ Free tier available - ✅ Good quality - ✅ Chinese language support - ❌ Requires Qwen API

Tool Configuration¶

Enable for Production:

AGENT_ENABLE_WEB_SEARCH: true
AGENT_ENABLE_GIT: true
AGENT_ENABLE_FILE_MANAGEMENT: true
AGENT_ENABLE_FOLDER_MANAGEMENT: true

Disable for Security:

AGENT_ENABLE_SHELL: false  # Keep disabled unless needed
AGENT_ENABLE_GITHUB: false  # If not using GitHub API

Custom Instructions¶

Tailor agent behavior with custom instructions:

AGENT_INSTRUCTION: |
  You are a scientific paper curator.

  For each paper:
  1. Extract: title, authors, abstract, key findings
  2. Categorize by research field
  3. Add relevant tags
  4. Create bibliography entry

  Use clear, academic language.

Troubleshooting¶

Agent Not Working¶

Problem: Agent fails to start or process messages

Solutions: 1. Check API key: echo $OPENAI_API_KEY 2. Verify base URL if using custom endpoint 3. Test API connection: curl $OPENAI_BASE_URL/models 4. Check logs: tail -f logs/bot.log

Slow Processing¶

Problem: Agent takes too long to process

Solutions: 1. Use faster model (GPT-3.5 instead of GPT-4) 2. Disable web search if not needed 3. Reduce AGENT_MAX_ITERATIONS 4. Check network latency to API endpoint

Poor Quality Results¶

Problem: Agent produces low-quality KB entries

Solutions: 1. Upgrade to GPT-4 or better model 2. Enable web search for more context 3. Customize AGENT_INSTRUCTION for your use case 4. Increase AGENT_TIMEOUT for complex content

API Errors¶

Problem: OpenAI API errors (rate limits, etc.)

Solutions: 1. Check API quota and billing 2. Implement retry logic (already built-in) 3. Consider using different endpoint 4. Monitor rate limits in dashboard

Comparison with Other Agents¶

vs. Qwen Code CLI¶

Feature	Autonomous	Qwen CLI
Language	Python	Node.js
Setup	pip install	npm install
API	OpenAI-compatible	Qwen specific
Flexibility	High	Medium
Free Tier	Depends on API	2000/day
Vision	Depends on model	✅ Yes

Choose Autonomous when: - Python-only environment - Need custom LLM provider - Already have OpenAI API - Want maximum flexibility

Choose Qwen CLI when: - Want free tier (2000/day) - Node.js is available - Vision support is required - Official Qwen integration preferred

vs. Stub Agent¶

Feature	Autonomous	Stub
AI	✅ Yes	❌ No
Tools	✅ Yes	❌ No
Quality	High	Basic
Speed	Medium	Fast
Cost	API cost	Free

Use Autonomous for production, Stub for testing.

Advanced Features¶

Custom LLM Connectors¶

Implement custom connector for any LLM:

from src.agents.llm_connectors import BaseLLMConnector

class CustomConnector(BaseLLMConnector):
    async def generate(self, messages, tools=None):
        # Your custom implementation
        pass

Tool Registration¶

Register custom tools programmatically:

agent.register_tool("my_tool", my_tool_function)

Context Management¶

Agent maintains execution context:

context = AgentContext(task="Process message")
context.add_execution(execution)
context.get_history()

Example Code¶

See examples/autonomous_agent_example.py for complete examples:

Basic usage
Custom instructions
Error handling
Tool registration