I'm building a managed AI agent service for small businesses. The agent itself is open-source (Hermes Agent by Nous Research), and my business layer sits on top: client manifests, safety templates, deployment scripts, pricing. The challenge is customizing the upstream code without creating maintenance nightmares when they ship updates.
Here's what I built and how I tested it.
The fork problem
The obvious move is to fork the repo and make your changes. Except:
- A public fork exposes your business logic to everyone
- Maintaining a divergent fork of a fast-moving project creates merge conflicts on every upstream update
- You need version pinning so all your clients run the same tested code
I wanted something that gave me fork-level stability without the merge pain. My changes are small: config defaults, a couple of gateway hooks, maybe a custom tool down the line. A full divergent fork is overkill.
Private fork with git patches
The solution is boring and that's why it works.
I maintain a private fork on GitHub and store my modifications as .patch files in my business repo, applied on top with a script. The fork gives me version pinning: I tag stable points, deploy clients from tags, and merge upstream on my own schedule. The whole thing is two shell scripts.
NousResearch/hermes-agent (public)
|
v git fetch upstream / merge
|
bilawalriaz/hermes-agent (private fork)
|
v git clone
|
~/hermes-agent/ (local working copy)
|
+-- patches applied from hermes-business/patches/
The patches directory has numbered categories: 001-099 for config, 100-199 for gateway, 200-299 for tools. Each patch is one focused change with a clear commit message. If upstream incorporates your change, you delete the patch. Zero maintenance overhead when nothing changes.
The fork stays private, so there's no risk of accidentally publishing business logic.
Testing with The Office
I have a YAML manifest system where each client deployment is defined by a single file. It specifies which profiles to create, which safety templates to use, which cron jobs to schedule, which LLM provider to wire up. A Python script reads the manifest and generates everything.
To test this, I created three Telegram bots and gave them Office characters:
| Bot | Character | What it tests |
|---|---|---|
| Jim Halpert | Dry, practical, cuts through noise | Solo tradesperson template (1 profile, simple) |
| Michael Scott | High-energy, leadership-focused | Consultancy template (3 profiles, multi-channel) |
| Dwight Schrute | Intense, procedural, security-first | Small agency template (2 profiles, ops + research) |
.hermes directories, separate .env files, separate memory stores. They all point at the same LLM provider for text and a separate one for vision, configured entirely through environment variables.
The character personalities come from soul templates. A SOUL file is basically the agent's system prompt. Jim gets "calm, concise, lightly sardonic" while Dwight gets "blunt, structured, uncompromising." But underneath the personality, every bot has the same base rules bolted on: no sending messages without approval, no modifying its own config, no fabricating outputs.
Running all three simultaneously
On macOS, each bot is just a Hermes gateway process with different HOME and HERMES_HOME environment variables. I wrote a launcher script:
./scripts/launch-bot.sh jim # foreground
./scripts/launch-bot.sh all # all 3 in background
./scripts/launch-bot.sh stop # kill all
All three connected to Telegram within seconds. Separate polling connections, separate session storage, no interference. The total resource usage is modest because inference happens remotely via API.
What the safety layer looks like
Every bot has approvals.mode: manual, meaning all terminal commands need human approval. The command_allowlist starts empty and gets audited monthly. Tirith (the policy engine) is enabled for additional guardrails. PII redaction is on.
For customer-facing profiles (not tested yet with these bots, but designed in the template system), there's an outbound-hold skill that intercepts any attempt to send a message to a customer. The message gets queued, the operator gets notified, and nothing goes out without explicit approval.
The whole thing is designed so that a non-technical business owner can receive useful AI assistance while a human stays in the loop for anything that matters.
What's next
The bots are running. The manifest system works. The private overlay keeps my customizations clean.
Remaining work: test cron bundles in production (morning digests, health checks), validate the outbound-hold workflow end-to-end, get real client feedback on the personality templates, and benchmark model quality against cost at different provider tiers.
If you run a small business and spend hours on repetitive admin, get in touch. First three setups are priced at cost while I dial in the process.