
AgentMail + Paperclip: The Stack I Chose and Why
When I started building the OpsForEnergy agent system, I evaluated half a dozen frameworks and platforms. LangChain, CrewAI, AutoGen, n8n, custom FastAPI services, and a few no-code tools. Each had strengths. Each had deal-breakers. The stack I ended up with — AgentMail for email, Paperclip for orchestration, Supabase for state, Telegram for delivery, and Claude Sonnet 4 for reasoning — was chosen through elimination, not hype.
AgentMail. The first requirement was that the agents had to work through email. Energy and construction run on email. AHJs do not use Slack. Subcontractors do not log into dashboards. If the agent system required anyone to change their communication channel, it would fail.
AgentMail is built specifically for email-native AI agents. It provides dedicated inboxes, parsing pipelines, and sending APIs. The integration took a few hours. The alternative — building a custom email parser on top of Gmail or Outlook APIs — would have taken days and introduced ongoing maintenance overhead.
Paperclip. I needed an orchestration layer that was lightweight, transparent, and easy to debug. Many agent frameworks hide too much magic. When something goes wrong, you are debugging a black box. Paperclip takes a more explicit approach: you define agents, tools, and workflows in code, and you can trace every step.
Paperclip also handles the MCP layer cleanly. I can define a Supabase query tool, a Telegram send tool, and an email read tool, then compose them into agent workflows without fighting the framework.
Supabase. Every agent needs persistent state. The Permit Agent needs to know which permits are active. The Field Agent needs to match documents to projects. The Ops Supervisor needs a week's worth of activity to compile a digest. Supabase provides a Postgres database with a simple REST and RPC API. It is fast, reliable, and requires no backend server.
Telegram. For PM notifications, I wanted something immediate, lightweight, and universally accessible. Telegram bots are free, easy to set up, and do not require the recipient to install a new app if they already use it. The messages are plain text, which forces the agents to be concise.
Claude Sonnet 4. The LLM choice came down to reliability on structured extraction and tool use. In my testing, Claude Sonnet 4 was the most consistent at returning valid JSON, following complex prompts, and handling long context windows. It is not the cheapest option, but for an operations system, accuracy matters more than cost.
What I did not choose and why: LangChain was too abstraction-heavy for my taste. CrewAI was promising but still early. n8n is great for linear automations but struggles with agentic reasoning. Custom FastAPI would have been flexible, but I wanted to spend my time on agent behavior, not infrastructure.
An honest limitation: This stack is optimized for email-native, text-heavy workflows. If your operations center on voice calls, video, or complex visual documents, you would need different tools. The stack is not universal — it is specific to the coordination problem I am solving.
Want to see this in action? Here's the demo →
AI Agent Build Plan for EPC Ops — a PDF stack comparison and architecture guide.
Get the build plan →