Overview

Matt Ganzac demonstrates how to optimize OpenClaw (an AI personal assistant) by implementing a multi-model approach and eliminating wasteful token usage. The core insight is that most AI systems waste money by loading entire conversation history and using expensive models for simple tasks - he reduced costs by 97% through strategic model routing and local processing.

Key Takeaways

  • Stop loading full conversation history on every request - OpenClaw was sending 2-3 million tokens every 30 minutes just for heartbeat checks by loading all context files and session history repeatedly
  • Use model hierarchy based on task complexity - Route simple tasks (file organization, data entry) to cheaper models like Haiku, while reserving expensive models like Opus for complex reasoning tasks
  • Implement local LLM for maintenance tasks - Use free local models (Ollama) for heartbeats and basic system checks instead of paying API costs for brainless operations
  • Create session management commands - Build a ’new session’ command that clears conversation history while preserving important information in memory to prevent exponential context growth
  • Monitor token usage in real-time - Conduct daily token audits and teach your AI to estimate costs before executing tasks to maintain cost awareness and optimization

Topics Covered