Overview
Matt Ganzac demonstrates how to optimize OpenClaw (an AI personal assistant) by implementing a multi-model approach and eliminating wasteful token usage. The core insight is that most AI systems waste money by loading entire conversation history and using expensive models for simple tasks - he reduced costs by 97% through strategic model routing and local processing.
Key Takeaways
- Stop loading full conversation history on every request - OpenClaw was sending 2-3 million tokens every 30 minutes just for heartbeat checks by loading all context files and session history repeatedly
- Use model hierarchy based on task complexity - Route simple tasks (file organization, data entry) to cheaper models like Haiku, while reserving expensive models like Opus for complex reasoning tasks
- Implement local LLM for maintenance tasks - Use free local models (Ollama) for heartbeats and basic system checks instead of paying API costs for brainless operations
- Create session management commands - Build a ’new session’ command that clears conversation history while preserving important information in memory to prevent exponential context growth
- Monitor token usage in real-time - Conduct daily token audits and teach your AI to estimate costs before executing tasks to maintain cost awareness and optimization
Topics Covered
- 0:00 - Introduction and Safety Warnings: Overview of OpenClaw AI assistant with important disclaimers about deployment risks and the need for developer experience
- 2:00 - Initial Cost Problems: How costs escalated to $2-3 daily just sitting idle, with monthly projections of $90+ for minimal usage
- 5:30 - Root Cause: Context Loading Issue: Discovery that OpenClaw loads entire conversation history and context files on every heartbeat and message
- 8:00 - Multi-Model Configuration: Setting up multiple AI models (Haiku, Sonnet, Opus) with task-based routing and escalation paths
- 11:30 - Local LLM Integration: Installing and configuring Ollama for free local processing of heartbeats and simple tasks
- 15:30 - Session History Management: Creating commands to dump session history and avoid uploading massive conversation logs on every API call
- 18:00 - Optimization Metrics and Monitoring: Adding token optimization as a success metric and implementing real-time cost tracking
- 21:00 - Real-World Results: Case study of overnight research task that cost $6 for 6 hours of work using optimized multi-agent approach