GoodLeads.Club Platform
BRIEF System architecture DATE 2026-05-01 STATUS Live in production
Agent Organization, Not An Agent

A team
of agents.
Working today.

GoodLeads ships product through an organization of AI agents the same way a software company ships through an organization of people: with managers, peers, rituals, escalation paths, and a culture documented in a wiki. The system is not an agent. It is an agent org — vertical roles, a horizontal role, a cross-model peer review function, and three self-improving loops.

The Org Chart

Vertical agents.
One horizontal agent.
Peer review functions.

Each box below is an agent identity defined by its system prompt, its skill library, its tool permissions, and its memory scope — not by which LLM happens to back it. Three of the four agent roles are Claude Sonnet 4.6; one (Code Judgment) is GPT-5 Codex. Models can be swapped by changing one config line. Roles are stable.

CROSS-STATE SCOPE PER-STATE SCOPE SCOPE ↕ HORIZONTAL AGENT — OPERATES THE TOOLS BELOW PLATFORM EM Reviews every PR · Daily org sweep · Friday recap · Auto-fixes mechanical gaps MECHANICAL DOD Deterministic · ~70% of gates No LLM · <50ms CODE JUDGMENT Codex · GPT-5 Different model family AGGREGATOR Tier-aware merge gate auto · advisory · ack · block FILES PRS · SUBJECT TO REVIEW VERTICAL · STATE-SCOPED CO STATE GM Morning + evening standup Wake on demand VERTICAL · STATE-SCOPED FL STATE GM Morning + evening standup Wake on demand VERTICAL · STATE-SCOPED VA STATE GM Morning + evening standup Wake on demand FOUNDER · CPO GIB Surface: Google Docs COMMENTS → NEXT SESSION Y-AXIS ENCODES SCOPE — PLATFORM EM IS LITERALLY HORIZONTAL ACROSS THE CANVAS

Platform EM is peer to the State GMs in role, but its scope crosses all of them — encoded by the bar literally spanning the canvas. State GMs file PRs upward into the EM unit, which operates three peer-review tools.

System Architecture

Where the org
actually lives.

The agent organization is gated behind a single feature flag. Set ENABLE_AGENT_LAYER=true and /api/v1/gm/* mounts onto the FastAPI app. Set it false (the Forsyth deploy configuration) and the entire agent runtime disappears — the data plane keeps running. This is the deliverable boundary in code.

AWS · us-east-1 Trigger GitHub Webhook PR events, HMAC-validated Trigger EventBridge Cron 10 rules · standups + sweeps Trigger Wake URL HMAC-signed, from email Trigger Doc Comments Read by next session Front Door ALB → FastAPI formation-gtm /api/v1/gm/* mounted if ENABLE_AGENT_LAYER ecs.run_task Compute ECS Fargate One task per session PIPELINE_CMD entrypoint CloudWatch logs External · Anthropic Managed Agents Cloud sandbox: bash + git Custom tools (HMAC-signed) Codex called for review Custom tool calls State RDS PostgreSQL findings · actions · wiki experiments · vendor_config doc_comments · budgets Data Plane Inputs SOS · Vendors Tracerfy · ZB · Enformion

Four trigger cadences. One control plane. The agent runtime is feature-flagged off the data plane.

Three Self-Improvement Loops

Concentric loops.
Different cadences.
One direction.

The org has three loops, each operating at a different timescale. The inner loop runs on every PR. The middle loop runs daily and weekly. The outer loop runs over multi-day experiments. They compound: every comment Gib makes can become a future scorer; every shipped experiment becomes a config row that future experiments ratchet from.

INNER PR REVIEW MIDDLE LEARNING OUTER EXPERIMENT WEBHOOK ~30s VERDICT ~5 min DAILY STANDUP 8:30 AM CT WEEKLY RECAP FRIDAY EXPERIMENT SHIPS ~5 days VERIFY · REVERT 14 days 1 s 1 min 1 hr 1 day 1 week 1 month ~10,000 : 1 RATIO BETWEEN INNER & OUTER LOOPS
Inner — every PR Middle — daily / weekly Outer — multi-day
Loop 01 — Inner

PR Review

Cadence: every PR · 3-5 min

Webhook fires Platform EM. Mechanical DoD runs deterministic gates in parallel with Codex Code Judgment. Aggregator decides verdict, keyed to PR tier. Auto-fix authority lets EM repair mechanical gaps itself rather than block.

Webhook → HMAC validation Mechanical DoD ∥ Code Judgment Aggregator → tiered verdict Auto-merge · advisory · ack · block
Loop 02 — Middle

Learning

Cadence: daily mining · weekly clusters

Founder commentary on Google Docs is mined into structured episodic memory. Weekly clustering produces scorer candidates. Promoted candidates ship as judge scorers in the experiment harness. Manager intuition becomes ML training signal.

Mine doc comments → corpus table Cluster by recency + frequency Cluster review artifact for founder Promoted → judge scorers (14 shipped)
Loop 03 — Outer

Experiment

Cadence: multi-day · agent-driven

Agents propose experiments, gated by per-agent monthly budget envelope. Variants forward-replay through the production context-builder. Wins write append-only vendor_config rows. Daily verifier compares post-shipment metrics; auto-reverts on regression.

propose_experiment → budget gate Forward-replay through live harness Ship rules → vendor_config row verify-recent-shipments → auto-revert
THE PR REVIEW CYCLE ENTRY · 12:00 PR OPENS webhook · HMAC validated MECHANICAL DOD deterministic · <50ms CODE JUDGMENT Codex · GPT-5 · different family ∥ PARALLEL DECIDE · 6:00 AGGREGATOR tier-aware merge gate VERDICT · 9:00 FAN-OUT 5 possible outcomes keyed to PR tier AUTO-FIX RETRY EM commits the fix · re-reviews TIER S — PEEL OFF GOOGLE DOC → Gib · plain English → AUTO-MERGE Tier C/B/A · 95% of PRs CLOSED-LOOP TOPOLOGY · THE AUTO-FIX RETRY EDGE IS THE SYSTEM'S DEFINING AUTHORITY

Most PRs exit the wheel at AUTO-MERGE. A fixable mechanical gap loops back via the prominent AUTO-FIX edge. Tier S — product-level risk — peels off to a plain-English Google Doc for Gib.

Four-Layer Memory

What's stored.
Where. How long.

Agent memory isn't one thing. It's four distinct layers, each with different persistence, update cadence, and access pattern. This maps to cognitive science, to how human organizations manage knowledge, and to the Karpathy "LLM Knowledge Base" pattern. The bar below encodes persistence as length — borrowing the visual idiom CS textbooks use for the memory hierarchy (registers → L1 → L2 → RAM → disk).

← VOLATILE · seconds PERSISTENT · indefinite →
Working
Episodic
Semantic
Procedural
Length = time-to-decay · Color saturation = update cadence
Layer 01

Working

Current task

Storage
Injected briefing binder + session events
Updated
Per session
Used by
The agent in the moment
Layer 02

Episodic

What happened

Storage
agent_finding · agent_doc_comment · experiment_result
Updated
Every session
Used by
Briefing binder
Layer 03

Semantic

What we know

Storage
wiki_page (state × domain)
Updated
Weekly synthesis skill
Used by
GMs and Code Judgment
Layer 04

Procedural

How we do things

Storage
skills/*.md · JTBD-shaped
Updated
PR-level evolution
Used by
read_skill tool, on demand
The Trust Ladder

Autonomy is a gradient.
Not a switch.

Tiers are computed mechanically from PR file paths. The merge verdict depends on tier — Tier C/B/A auto-merge when both reviewers agree; Tier S routes to a Google Doc summary in plain English. Two architectural decisions sit underneath this: ADR-005 removes the founder from the PR approval loop; ADR-006 forbids single-model review.

C

Low Risk

docs
tests
README
VerdictAuto-merge if both reviewers green. Founder never sees these unless asked.
~50% of PRs
B

Internal

internal services
support utilities
non-hot-path
VerdictAuto-merge if both reviewers green. Logged in daily sweep.
~27%
A

Hot Path

scoring helpers
data services
codeowners-gated
VerdictAuto-merge if both reviewers green and CODEOWNERS satisfied.
~15%
S

Product Risk

scoring formula
vendor cost path
deliverable schema
VerdictPlain-English Google Doc to founder. Comment to approve. Next session merges.
~8%
Column width = approximate PR volume · Sprints 10–12 Tier S is rare by design
The Forward-Thinking Pieces

Wiki-onboarded agents.
An experiment harness
that runs on the harness.

Two pieces of the system are unusual relative to most agent stacks. The wiki collapses agent onboarding into the same surface humans already use. The experiment harness eliminates the drift between test and production by sharing the production context-builder.

Wiki Layer

Edit the agent in seconds.

Most agent harnesses bake "what we know" into the system prompt. Changing what the agent knows requires changing code, running tests, reviewing the diff, deploying. The cycle is hours-to-days.

Our wiki is a database table keyed by (state, domain). The Code Judgment agent's onboarding — engineering tenets, code-review checklist — is three wiki pages. Open the wiki UI, edit a tenet, hit save. The next PR review uses the new tenet.

  • Onboarding loop is editable in seconds, not days
  • Different agents read different views of the same table
  • Wiki edits are versioned and auditable
  • Semantic memory updates itself via weekly synthesis skill
Experiment Harness

Variants run on the live context.

Most A/B harnesses build a separate, simplified context for variants — and then experiments don't predict production. We solved this by making the variant runner a thin wrapper around the live build_matrix_context(), with snapshot data passed as override.

PRODUCTION VARIANT Live agent production data Variant runner snapshot override SHARED NODE · ZERO DRIFT build_matrix_context() production code path · one source Standup output ships if scored win Variant output scored, judged

Variants and production share one code path. There is no drift. Agents are first-class consumers of the harness through five custom tools. Per-agent monthly budget envelopes cap spend. Daily verifier auto-reverts shipped configs whose production metrics drop.

  • Forward-replay = no drift between test and production
  • Agents propose, run, ship within budget
  • Append-only vendor_config carries experiment_id
  • Auto-revert closes the boy-scout-rules loop
Commercial Architecture

The boundary that
makes the IP licensable.

The Forsyth Phase 1 Agreement perpetually licenses anything embedded in the deliverable. By keeping the agent organization in a sibling directory the deliverable never imports, the org stays GoodLeads-only retained IP. A CI test fails the build on any new import that crosses the line. Mike's portability methodology becomes automatable: a Forsyth-targeted build literally strips agents/ from the image.

GoodLeads Build · Full

What we ship to ourselves

formation-gtm/
backend/data plane
pipeline/data plane
classification-service/data plane
agents/retained IP
backend/services/agents/retained IP
All directories present · agent layer LIVE
Forsyth Build · Stripped

What ships to a licensee

formation-gtm/
backend/data plane
pipeline/data plane
classification-service/data plane
agents/— removed —
backend/services/agents/— removed —
Agent layer stripped · CI test enforces import boundary

Same source tree. Two states. The agent organization is a literal subtraction — not a wall, not a separate repo. Every customer after Forsyth is the same shape: data platform shipped, agent org retained.

Operational Evidence

Today, on a normal Friday.

All numbers below are 2026-05-01 production reality, not aspirations.

7
Scheduled agent sessions fired between 7:53 and 9:31 CT today · all exit 0
14
Corpus-derived judge scorers in production · each tied to a founder comment
130+
Merged PRs across Sprints 10-12 · majority agent-authored, auto-merged
0
Pages or engineer interventions required this week · org runs itself
Today · 2026-05-01 · captured at 9:51 CT
6 AM
7 AM
8 AM
9 AM
10 AM
11 AM
12 PM
5 PR REVIEWS Platform EM · 7:53–8:54
CO STANDUP 8:31 → email 8:55
PLATFORM EM × 2 daily sweep + weekly · 9:01
FL STANDUP 9:31 → email 9:47
7 sessions · all exit 0 · 9 emails delivered · 0 engineer interventions · rest of day quiet by design
For The Technical Review Board

What's hard to copy.
And why.

A competitor copying the surface ships in a sprint. Copying the architectural choices behind it takes a year. Each "high-difficulty" row below presupposes commitments made months earlier — they're each one or two commits in size, but they require having decided correctly back then.

Component
Difficulty
Why
Org chart (vertical + horizontal agents)
Medium
Pattern is now public; the discipline of not collapsing roles into one mega-agent is the hard part.
Cross-model review
Low–med
Two API integrations + an aggregator. Defensible only because we have it shipped.
Trust ladder + tier-keyed autonomy
Med–high
The policy file is small, but routing Tier S to plain-English Google Docs (not Slack DMs to engineers) is a cultural commitment most teams won't make.
Wiki-onboarded agents
Medium
Schema + 1 tool. Easy to copy, but only valuable if you also have a manager who actually edits the wiki.
Comment-corpus → judge scorer
High
Requires (a) a manager who comments in writing on a stable surface, (b) a clustering pipeline, (c) the trust-ladder gate. Competitors try to replace this with feedback forms; manager corpus is the differentiator.
Forward-replay experiment harness
High
Variants share the production context-builder — not a separate one. Most teams compromise this; once compromised, experiments stop predicting production.
Boy-scout-rules auto-revert
Med–high
Per-vendor verifier registry + daily cron + finding system. Most teams stop at "agent ships"; we kept going to "agent ships, agent verifies, agent reverts."
Deliverable Boundary as commercial architecture
High
Legal and commercial design choice enforced by code. Most teams don't realize it's a choice until they try to license their work and find their IP fused with their deliverable.