feat: PR preview environments (proposal + prototype)#2407
feat: PR preview environments (proposal + prototype)#2407nick-inkeep wants to merge 11 commits intomainfrom
Conversation
- Fly.io multi-container Machine for DBs (Doltgres, Postgres, SpiceDB) - agents-api + manage-ui on Vercel with branch-scoped env vars - TCP routing via Machines API (pg_tls, raw gRPC, TLS) - Custom preview domains (pr-N-api/ui.preview.inkeep.com) - Auto seed via inkeep push (activities-planner) - Teardown on PR close + weekly orphan cleanup cron Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Mask all generated credentials with ::add-mask:: to prevent log exposure - Randomize bypass secret (was predictable pattern) - Fix Dockerfile path in fly.toml (Fly resolves relative to config dir) - Add explicit permissions block and timeout-minutes to all jobs - Pin flyctl action to @1.6 (was @master) - Change sslmode=no-verify to sslmode=require for consistency - Add failure exit to machine state wait loop - Add checkout step to cleanup workflow for gh CLI context - Surface Vercel deploy errors instead of swallowing Co-Authored-By: Claude Opus 4.6 <[email protected]>
1.6 doesn't exist yet. Latest available tag is 1.5. Co-Authored-By: Claude Opus 4.6 <[email protected]>
flyctl resolves dockerfile relative to config dir but machine_config relative to deploy context. Changing to `cd .fly && flyctl deploy .` so both resolve within .fly/. Co-Authored-By: Claude Opus 4.6 <[email protected]>
fly deploy was creating 2 machines by default. The TCP services configuration only updated one, causing health checks to fail when traffic was routed to the other (unconfigured) machine. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Verbose output on attempts 1, 10, 20, 30... and 60 so we can see why psql/curl fails from the GitHub runner. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Ubuntu's psql on GH Actions runners doesn't support the ALPN negotiation
required by Fly's pg_tls handler ("SSL error: no application protocol").
Switch all GH runner DB access (health checks, migrations, auth init)
to use flyctl proxy tunnels (WireGuard), which bypass the edge proxy.
Vercel env vars correctly keep public Fly URLs — Node.js pg handles ALPN.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
10 hard-won lessons from 7 commits / 5+ deploy cycles covering: - Fly.io pg_tls ALPN issues with Ubuntu psql - Dockerfile path resolution inconsistencies - HA machine duplication, TCP service configuration - GH Actions secret masking, flyctl version pinning - Architecture notes on dual connectivity paths Co-Authored-By: Claude Opus 4.6 <[email protected]>
Corrections based on confirmed findings from 8 deploy/debug iterations: - D15 revised: sslmode=require for Vercel (Node.js), sslmode=disable for proxy tunnels. Original spec said sslmode=no-verify everywhere. - D17 revised: Migrations MUST use flyctl proxy tunnels on GH runners. Ubuntu psql lacks ALPN support for Fly's pg_tls handler. - New decisions D20-D25: --ha=false, deploy from .fly/ dir, pin [email protected], ::add-mask:: for all secrets, random bypass secret, explicit permissions block. - Updated Fly services table with ALPN issue documentation. - Added dual connectivity path explanation (Vercel vs GH runner). - Added Stage 2 implementation status section with validated/unvalidated items. - Replaced stale workflow YAML with diff table vs actual implementation. - Corrected TLS deferral (pg_tls DOES encrypt Postgres connections). Co-Authored-By: Claude Opus 4.6 <[email protected]>
Include SPEC.md and PROGRESS.md covering: - Architecture: Fly.io multi-container Machines for DBs + Vercel for API/UI - Validated: TCP routing, flyctl proxy tunnels, migrations, auth init, seeding - GH Actions workflow (20-step deploy + teardown) - 25 decisions logged with evidence from 8 deploy/debug iterations Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Ito Test Report ✅24 test cases ran. 24 passed. This verification run validated PR #2407's preview environment implementation through comprehensive code review and infrastructure testing. All 24 included test cases passed successfully. The preview environment workflow code demonstrates correct security practices (credential masking, least-privilege permissions), proper idempotency patterns, and robust error handling. Production isolation is confirmed through branch-scoped Vercel env vars and PR-prefixed Fly app naming. Note: The actual preview environment infrastructure for PR #2407 was not deployed (PR is closed), so tests requiring live preview URLs were verified through code review and local environment validation where applicable. ✅ Passed (24)
📋 View Recording |
Summary
Adds automated per-PR preview environments so every PR gets an isolated backend stack (Doltgres, Postgres, SpiceDB) on Fly.io, with Vercel preview deployments for agents-api and manage-ui connected to it. Zero manual setup — lifecycle fully managed by GitHub Actions.
This is a proposal + working prototype. The spec and implementation have been validated through 8 deploy/debug iterations against a test repo (
inkeep/inkeep-agents-test, PR #1). E2E validation is blocked on main branch CI issues (Dolt-related) but core infrastructure is proven.Architectural decisions
Fly.io multi-container Machines for databases, Vercel for application code. Each PR gets a single Fly Machine running 4 containers (sidecar, Doltgres, Postgres, SpiceDB) sharing localhost networking. agents-api and manage-ui stay on Vercel with branch-scoped env vars pointing to the Fly databases. Custom
pr-{n}-api.preview.inkeep.com/pr-{n}-ui.preview.inkeep.comdomains enable cross-service cookie auth.Key decisions (25 logged in SPEC.md):
pg_tlshandlerpgdriver handles ALPN correctly.sslmode=require.flyctl proxyWireGuard tunnelspsqllacks ALPN support forpg_tls— "SSL error: no application protocol". Connect via localhost,sslmode=disable.fly deploydoesn't reliably apply TCP services to multi-container Machines.--ha=false?upsert=truevercel env addfails on duplicates duringsynchronizeevents.::add-mask::before$GITHUB_OUTPUTcd .fly && flyctl deploy .[build].dockerfile(relative to config) and[experimental].machine_config(relative to deploy context).Gray areas:
Changes
Proposal (documentation)
proposals/pr-preview-environments/SPEC.md— Full spec: architecture, 25 decisions with evidence, requirements, phases, risks, validation resultsproposals/pr-preview-environments/PROGRESS.md— Implementation tracker with 10 hard-won learnings from iteration cyclesFly.io infrastructure
.fly/Dockerfile— Minimal sidecar image (alpine:3.21,sleep infinity).fly/fly.toml— App config with[experimental]multi-container support.fly/machine-config.json— 4-container definition (sidecar, doltgres, postgres, spicedb) with health checks.fly/.dockerignore— Build context exclusionsGitHub Actions workflows
.github/workflows/preview-env.yml— 20-step deploy job + teardown job.github/workflows/preview-cleanup.yml— Weekly cron to destroy orphanedpr-*-agentsapps whose PRs are closedHow to verify
proposals/pr-preview-environments/SPEC.mdfor the full architecture and decision loginkeep/inkeep-agents-test(PR Update README.md #1) — 8 deploy cycles validated:flyctl proxytunnels from local machineTest plan
flyctl proxytunnels provide localhost DB access--ha=falseprevents duplicate machinesflyctl apps create ... 2>/dev/null || true)pnpm db:migratevia proxy tunnels in CI (main branch Dolt issue)Future considerations
inkeep pushCLI. May need a checked-in template JSON if CLI isn't available in CIGenerated with Claude Code