Date: 2026-06-19 · Author: Claude (autonomous overnight session) · Status: for review
This is the running record of every non-obvious decision made while building the
/testing harness, the notifications system, and walking the reservation
lifecycle end-to-end. Each entry states the problem, the options, and the choice
— so you can veto or redirect anything in the morning. Anything marked
⚠ CONTROVERSIAL is where I'd most value a second opinion.
0. Auth for the autonomous walkthrough — ADMIN_DEV_BYPASS=1
Problem. The walkthrough has to drive the office UI in Chrome overnight with
no human to complete an OAuth round-trip. Seed-only is a true clean slate (0
leads, 0 reservations, only the System + Agent service accounts — no human staff
account). ADMIN_DEV_BYPASS was 0 (forcing real Google sign-in).
Options. (a) Real Google sign-in via the connected Chrome studio.chat profile. (b) Dev bypass.
Choice: (b). Local real Google OAuth almost certainly fails — the prod OAuth
client's callback allowlist has no 127.0.0.1 entry (per the prod-redirect
memory: custom domain, exact callback, no wildcards). Dev bypass is the
intended local-dev mode, grants full admin, and has zero prod impact
(VERCEL_ENV=production hard-locks bypass off regardless). I flipped .env
ADMIN_DEV_BYPASS 0 → 1 with a comment, and will note it for you to flip back
to re-test the real sign-in flow.
Consequence. The dev-bypass user has accountId: null, so it can't be a
notification recipient on its own. See §3 (recipient identity).
1. Notification event spine — hook createAuditEntry, don't scatter emits
Problem. "Create hooks for notifications on the important stuff" across a large lifecycle (lead created, quote sent, deposit cleared, confirmed, returned, settled, closed, claim filed, verification approved, inspection signed, hold expired, overdue flagged…).
Options. (a) Add explicit notify(...) calls at each of ~20 event sites.
(b) Hook the one choke point nearly every business event already flows through:
createAuditEntry (src/lib/office/audit.ts).
Choice: (b). Every meaningful mutation already writes an audit row with
(entity_type, action, from_state, to_state, payload, source). After a
successful audit insert, createAuditEntry calls a single guarded hook
onAuditEntry(entry). A pure registry maps (entity_type, action, to_state)
→ an event descriptor (title, category, severity, default channels); non-
notifiable rows (e.g. email_sent, tag_changed) are simply absent from the
map and ignored. This gives near-complete coverage from one wiring point and
keeps the taxonomy in a single gated, tested module.
Guarantees. The hook is fully wrapped in try/catch and never throws — an audit write (and the transition that triggered it) can never fail because of notifications, matching the existing "emails never throw into the caller" contract. The hook does only fast DB inserts (in-app rows + outbox enqueue); the actual Slack HTTP call is deferred to the queue worker.
2. Delivery is queue-driven — durable notification_outbox table
Problem. "Slack notifications should be driven by a queue (kafka or sqs)… if you cannot build that without my input, prepare for it in code and build what you can that works."
Choice. A durable outbox table (notification_outbox) is the queue:
rows are pending → processing → delivered | failed with attempts,
run_after (backoff), and a JSON payload. A NotificationQueue interface
(enqueue, claimBatch, markDelivered, markFailed) has one implementation
today — DbOutboxQueue — which needs no external infra and works locally and
on Vercel. A worker (drainOutbox) claims a batch, delivers each, and acks.
Why an outbox and not Kafka/SQS now. A real broker needs infra + credentials
I can't provision autonomously. The transactional-outbox pattern is the
correct first step anyway (durable, at-least-once, survives restarts) and is
exactly what you'd later relay into Kafka/SQS. To go real: implement
KafkaQueue/SqsQueue against the same interface, or keep the outbox and add a
relay that ships pending rows to the broker. Nothing else changes.
Worker triggering. Two paths: (a) a cron route /api/cron/notifications
(added to vercel.json) drains on a schedule — the reliable path; (b) a
best-effort fire-and-forget kick right after enqueue for low dev latency. Both
are idempotent (claim-by-status).
Slack transport. Mirrors the email deliveryMode pattern exactly: if a
channel has a configured incoming-webhook URL it POSTs; otherwise it dry-runs
(logs + marks delivered with dryRun: true) so the whole pipeline is testable
with no Slack app. To go real: create a Slack app, add Incoming Webhooks (or
a bot token + chat.postMessage), and paste the webhook URL per channel in the
admin routing UI (or set SLACK_BOT_TOKEN). Documented in
docs/architecture/notifications.md.
3. In-app recipient identity under dev bypass — map to the super-admin
Problem. In-app notifications target staff accounts (real account_id).
The dev-bypass operator has no account, so its inbox would always be empty.
Choice. A helper currentStaffAccountId() resolves: real signed-in
accountId → else, under dev bypass, the super-admin account id
(fetchSuperAdminAccountId). The dev-bypass user is "the administrator", so
showing the sole operator's inbox is the sensible mapping. Clearly commented.
Because seed-only has no human admin, I insert a local-only brandon@studio.chat
administrator account (idempotent SQL, not in seed.sql — prod is handled by
the sign-in bootstrap, and the no-brandon-in-seed memory stands). This is also
exposed as a /testing tool ("seed demo admin") so a human can recreate it.
Default subscriptions for that admin are seeded so notifications land out of the
box.
4. Subscriptions model — entity + category, per-channel prefs
notification_subscriptions: (account_id, scope_type, scope_key, in_app, slack).
- entity scope: a specific reservation/account/lead (
scope_type='entity',scope_key='<entity_type>:<entity_id>') — "watch this thing". - category scope: all events of a kind (
scope_type='category',scope_key='leads'|'reservations'|'payments'|…) — "tell me about all leads". Staff manage their own from/notifications; a reusable bell affordance on entity detail pages toggles an entity subscription. Soft-deletable (deleted_at) per the house pattern.
5. Admin Slack routing — dedicated table, not settings-KV
Problem. Admins map event categories/keys → Slack channels (new lead → #sales, reservation confirmed → #operations).
Choice. A dedicated notification_routes table
(match_type, match_key, slack_channel, webhook_url, enabled) rather than a
settings-KV blob — routing is queried per-event on the hot path and benefits
from real rows/indexes and an audit trail, and the admin UI maps cleanly to
rows. Admin-only CRUD lives on a /notifications "Slack routing" tab.
6. ⚠ CONTROVERSIAL — the public contact form now CREATES A LEAD
Problem. Your own example is "new lead (contact form) → #sales", and the
walkthrough starts "from 0 (i.e. lead)". But today the public /contact form
(submitContact) only emails/WhatsApps the studio — it does not create a
lead. So there is no lead entity to notify on, and "from 0" has no real entry
point in the product.
Choice. Wire submitContact to also create a lead via the existing reusable
createLead (find-or-create account from the form's name/email/phone/locale,
referrer = "contact form", message = first lead comment, source audit =
api/system). That leads.created audit row then drives the notification →
#sales, exactly matching your example. The email/WhatsApp notification stays
(belt-and-suspenders) but can be retired later.
Why controversial. (a) The contact form is public + unauthenticated, so it
now writes rows — mitigated by the existing IP rate-limit and find_or_create_account
idempotency, but it's a spam surface. (b) It changes lead provenance (some
leads now arrive without a staffer triaging). (c) Possible duplicate leads if a
staffer also creates one. Mitigations applied: reuse the idempotent account
RPC; tag/referrer the lead as "contact form" so triage can tell origin; keep the
existing studio email so nothing is lost if lead-creation is later reverted.
If you object, revert is one call site — the notification system itself is
agnostic to where the lead comes from.
7. /testing page — non-prod only, lifecycle driver
Gated to VERCEL_ENV !== "production" (server-side; the nav item is hidden in
prod). It hosts the tools a human needs to repeat the demo: seed a demo
admin/lead, jump a reservation to any stage (via the existing super-admin
forceReservationStatus + payment/inspection shortcuts), simulate pickup/return
scans (the web office has no gear-scan UI — that's the staff app), run the
crons on demand, fire a test notification, and seed a Slack route. Tools are
added as the walkthrough surfaces the need (the page is explicitly a
"discovered-as-I-go" toolbox).
8. ⚠ The visual Chrome walkthrough is BLOCKED on a browser-profile choice
Two Chrome profiles are connected to the extension ("Browser 1", "Browser 2") and the names don't say which is the studio.chat Google account vs. the bioscope one. My standing instruction is to never act in the bioscope account, and the tool requires you to pick the profile — which I can't resolve while you're asleep without risking the wrong identity. So I did the walkthrough functionally instead, which is equivalent for finding broken/missing things:
- Render-checked every page a demo touches (leads list/detail, reservations list/detail/create, account detail, verifications, /notifications, /testing) — all return 200 and render the new affordances (watch toggle, inbox, routing).
- Walked the whole lifecycle through the real code paths as an e2e test
(
notifications-lifecycle.e2e.test.ts): lead → reservation → quoted → accepted → confirmed → returned → settled → closed, asserting the in-app + Slack notifications fire at each stage. Plus the pre-existingrental-lifecyclee2e proves the event-gated path (deposits, scans, inspections) end-to-end.
To do the visual pass in the morning: tell me which connected browser is the studio.chat profile (or open the confirmation screen and Connect the right one), and I'll click through it and record a GIF. Everything is already wired and verified to render; this is the one thing I couldn't safely self-serve.
Running list of gaps found during the walkthrough
- Contact form didn't create a lead (FIXED — see §6). The single biggest "missing" thing vs. the brief's mental model. Now wired.
- No gear pickup/return UI in the web office (by design — staff app only).
Filled by theRESOLVED (you asked for it): built a real staff scan tool — per-unit check out / check in on the reservation's "units & scanning" panel, backed by/testing"force stage" tool.pickupAsset/returnAsset(the same calls the iOS app makes), correctly gated on the confirmed status + signed inspections, with the last check-in auto-advancing to returned. The/testingscan-out/scan-in shortcuts remain for one-click demos. - No super-admin in seed-only (expected). Dev bypass + a fresh seed has no human admin, so the inbox is empty until you click "seed demo admin" on /testing (or sign in). The notifications page degrades gracefully (it still shows Slack routing for admins) rather than looking broken. Decision §3.