Skip to main content

Deployment Runbook

Overview

Agentix deploys as two services:
ServicePlatformTriggerBuild
Web (Next.js frontend)VercelPush to mainnext build
API (Express + BullMQ workers)RailwayPush to mainDocker build from Dockerfile
Both services auto-deploy on push to main. This runbook covers the full deployment lifecycle: pre-deploy checks, deployment process, database migrations, rollback procedures, secret rotation, and post-deploy verification. Related runbooks:

1. Pre-Deploy Checklist

Before deploying to production, verify each item:
  • CI passes on main: Check GitHub Actions — lint, type-check, test, and build must all pass.
  • If database schema changed: Migration has been tested on staging first (see Staging Runbook, Section 4).
  • If new environment variables added: Variables are set in both Vercel (for web) and Railway (for API) dashboards before deploy.
  • Staging verification passed: The change has been deployed and tested on staging (see Staging Runbook, Section 5).
  • For breaking API changes: Coordinate deploy order — if the API changes break the current web build, deploy API first. If the web depends on a new API endpoint, deploy API first.
  • Database backup taken (if migration involved): Run a manual backup before applying schema changes (see Database Backup Runbook, Section 2).

2. Vercel Deployment (Frontend)

Auto-Deploy (Default)

Pushing to main triggers a Vercel build and deployment automatically.
  1. Build process:
    cd apps/web → npm install → npx next build
    
    Vercel handles this automatically based on the project’s root directory configuration.
  2. Production URL: https://app.agentix.app (or the configured custom domain).
  3. Build logs: Vercel Dashboard > Project > Deployments tab > click the latest deployment to view build output.
  4. Build failure behavior: If the build fails, Vercel keeps the previous deployment active. There is no downtime. Fix the build error and push again.

Manual Deploy

If auto-deploy is disabled or you need to deploy without pushing:
# Install Vercel CLI
npm install -g vercel

# Deploy to production
vercel --prod
Requires Vercel CLI authentication (vercel login).

Build Configuration

The web app builds using the Next.js build pipeline:
SettingValue
FrameworkNext.js
Root directoryapps/web
Build commandcd ../.. && npm run build (Turborepo)
Output directory.next
Node.js version22

3. Railway Deployment (Backend)

Auto-Deploy (Default)

Pushing to main triggers a Railway build from the Dockerfile at the repository root.
  1. Build process:
    FROM node:22-slim
    # Install dependencies, copy source, generate Prisma client
    # See: Dockerfile in repo root
    
  2. Health check: Railway pings /health after deployment (configured in railway.toml):
    [deploy]
    healthcheckPath = "/health"
    restartPolicyType = "on_failure"
    restartPolicyMaxRetries = 3
    
    The deployment is only promoted to active if the health check passes. If it fails, Railway keeps the previous deployment running.
  3. Build logs: Railway Dashboard > Project > API service > Deployments tab.
  4. Build failure behavior: Railway does not promote failed deployments. The previous healthy deployment continues serving traffic.

Manual Deploy

# Install Railway CLI
npm install -g @railway/cli

# Authenticate
railway login

# Link to project
railway link

# Deploy
railway up

Build Configuration

SettingValue
BuilderDockerfile
Dockerfile pathDockerfile (repo root)
Health checkGET /health
Restart policyOn failure, max 3 retries
Exposed port3001

4. Database Migrations

When to Run

Run migrations when any PR adds or modifies files in apps/api/prisma/migrations/. Check with:
git diff --name-only HEAD~1 | grep "prisma/migrations"

Migration Procedure

Step 1: Take a backup Before any migration, create a manual backup (see Database Backup Runbook, Section 2):
pg_dump -Fc -d "$PRODUCTION_DATABASE_URL" > agentix_backup_$(date +%Y%m%d_%H%M%S).dump
Step 2: Run migration on staging FIRST
DATABASE_URL="$STAGING_DATABASE_URL" npx prisma migrate deploy
Verify staging works correctly after the migration. Check the health endpoint, test affected features, and review logs. Step 3: Run migration on production
DATABASE_URL="$PRODUCTION_DATABASE_URL" npx prisma migrate deploy
Step 4: Verify
DATABASE_URL="$PRODUCTION_DATABASE_URL" npx prisma migrate status
Expected: Database schema is up to date!

Important Notes

  • prisma migrate deploy is safe: it only applies pending migrations and never resets data.
  • Migrations are applied in order based on the migration directory timestamps.
  • If a migration fails midway, check the _prisma_migrations table for a failed entry:
    SELECT * FROM _prisma_migrations WHERE finished_at IS NULL OR rolled_back_at IS NOT NULL;
    
    Fix the issue, then re-run prisma migrate deploy.
WARNING: Never use prisma migrate reset or prisma db push in production. These commands can drop tables and delete data. Use only prisma migrate deploy for production databases.

5. Rollback Procedures

Vercel Rollback (Frontend)

Option A: Dashboard (Instant)
  1. Open Vercel Dashboard > Project > Deployments.
  2. Find the previous successful deployment.
  3. Click the ”…” menu > “Promote to Production”.
  4. The rollback takes effect immediately (no rebuild required).
Option B: CLI
vercel rollback
Rolls back to the previous production deployment instantly. Option C: Git revert
git revert HEAD
git push origin main
Triggers a new Vercel build with the reverted code.

Railway Rollback (Backend)

Option A: Dashboard (Redeploy Previous)
  1. Open Railway Dashboard > Project > API service > Deployments.
  2. Find the previous successful deployment.
  3. Click the deployment > “Redeploy”.
  4. Railway rebuilds from the previous commit’s Docker image.
Option B: Git revert
git revert HEAD
git push origin main
Triggers a new Railway build with the reverted code. This is the safest approach as it creates an auditable commit.

Database Rollback

Database rollbacks are the most complex because Prisma does not support down migrations. Scenario 1: Additive migration (new columns, new tables) Rollback is usually not needed. Old application code ignores new columns and tables. Simply roll back the application code and the unused schema remains harmlessly. Scenario 2: Destructive migration (dropped columns, renamed tables) Restore from backup (see Database Backup Runbook):
  1. Do not roll back the application code yet — it may depend on the new schema.
  2. Restore the database from the pre-migration backup:
    pg_restore -d "$NEW_DATABASE_URL" --clean --if-exists agentix_backup_YYYYMMDD_HHMMSS.dump
    
  3. Update DATABASE_URL to point to the restored instance.
  4. Roll back the application code.
  5. Redeploy.
Scenario 3: Create a corrective migration Instead of restoring, create a new migration that undoes the changes:
# Create a new migration that reverses the problematic changes
npx prisma migrate dev --name revert_problematic_change
This is preferred over backup restore when the data changes are small or reversible.

6. Secret Rotation

Rotation Procedures by Secret

SecretLocationRotation Steps
BETTER_AUTH_SECRETRailway1. Generate new secret: openssl rand -hex 32. 2. Update in Railway env vars. 3. Redeploy API. 4. Impact: All existing sessions are invalidated — users must re-login.
OPENAI_API_KEYRailway1. Create new key in OpenAI Dashboard. 2. Update in Railway env vars. 3. Redeploy API. 4. Revoke old key in OpenAI Dashboard.
CREDENTIAL_ENCRYPTION_KEYRailwayWARNING: Rotating this key makes all existing encrypted tool credentials unreadable. 1. Decrypt all credentials with old key. 2. Update env var in Railway. 3. Re-encrypt all credentials with new key. 4. Redeploy API.
RESEND_API_KEYRailway1. Create new key in Resend Dashboard. 2. Update in Railway env vars. 3. Redeploy API. 4. Revoke old key in Resend.
SENTRY_DSNRailway + VercelDSN rarely changes. If needed: 1. Update in both Railway and Vercel env vars. 2. Redeploy both services.
SENTRY_AUTH_TOKENVercel1. Create new token in Sentry. 2. Update in Vercel env vars. 3. Redeploy web. 4. Revoke old token.
POSTHOG_API_KEYRailway + Vercel1. Rotate in PostHog Project Settings. 2. Update in both Railway and Vercel env vars. 3. Redeploy both services.

General Rotation Procedure

For any secret:
  1. Generate a new credential in the service’s dashboard.
  2. Update the env var in the deployment platform (Railway and/or Vercel).
  3. Redeploy the affected service(s) to pick up the new value.
  4. Verify the service works with the new credential (health check, test request).
  5. Revoke the old credential in the service’s dashboard.
Important: Always generate the new credential before revoking the old one. There will be a brief window where both credentials are valid — this is expected and prevents downtime.

Key Generation Commands

# Generate a new BETTER_AUTH_SECRET
openssl rand -hex 32

# Generate a new CREDENTIAL_ENCRYPTION_KEY
openssl rand -hex 32

# Generate a random webhook verify token
openssl rand -hex 16

7. Post-Deploy Verification

After every production deployment, verify:
  • API health check passes:
    curl https://api.agentix.app/health
    
    Expected response:
    {"status":"ok","timestamp":"...","checks":{"db":"ok","redis":"ok"}}
    
  • Web loads correctly: Visit https://app.agentix.app and confirm the login page renders without errors.
  • Sentry: Check the Sentry Dashboard for new errors in the first 15 minutes post-deploy. A spike in errors indicates a regression.
  • BetterStack: Confirm the uptime monitor shows green for both web and API endpoints.
  • If migration was run: Spot-check affected data via API requests or Prisma Studio:
    DATABASE_URL="$PRODUCTION_DATABASE_URL" npx prisma studio
    
  • If webhook changes: Send a test WhatsApp message and verify it is processed correctly. Check Railway API logs for the webhook event and worker processing.
  • BullMQ workers: Check Railway API logs for worker startup messages confirming all 3 workers are running (message-processing, broadcast-sending, audit-processing).

8. Emergency Procedures

Site Down After Deploy

  1. Immediately rollback using the fastest method:
    • Vercel: Dashboard > Promote previous deployment (instant).
    • Railway: Dashboard > Redeploy previous deployment.
  2. Verify the rollback resolved the issue (health check, site load).
  3. Investigate the root cause on the rolled-back commit.
  4. Fix, test on staging, then re-deploy.

Database Migration Failed

  1. Do NOT rollback application code yet — the code may depend on the new schema.
  2. Check the _prisma_migrations table:
    SELECT migration_name, started_at, finished_at, rolled_back_at
    FROM _prisma_migrations
    ORDER BY started_at DESC
    LIMIT 5;
    
  3. If migration partially applied (started but not finished): Restore from the pre-migration backup (see Database Backup Runbook).
  4. If migration failed cleanly (error before any changes): Fix the migration SQL and re-run prisma migrate deploy.
  5. After fixing: redeploy and verify.

Secret Leaked

  1. Immediately rotate the leaked secret (see Section 6).
  2. Check audit logs and access logs for unauthorized access during the exposure window.
  3. If the secret was a BETTER_AUTH_SECRET: all sessions are invalidated on rotation. Users re-login.
  4. If the secret was an OPENAI_API_KEY: check OpenAI usage dashboard for unexpected charges.
  5. If the secret was a CREDENTIAL_ENCRYPTION_KEY: all tool credentials need re-encryption after rotation.
  6. Notify affected users if customer data may have been exposed.
  7. Document the incident: what leaked, exposure window, impact, remediation.

High Error Rate After Deploy (No Outage)

  1. Check Sentry for the new error pattern.
  2. If errors are isolated to a specific feature: consider a targeted fix instead of full rollback.
  3. If errors affect core functionality (auth, webhooks, workflows): rollback immediately.
  4. If errors are transient (connection timeouts, cold starts): monitor for 5 minutes. Railway and Vercel may need time to stabilize after deploy.

9. Deploy Checklist Summary

Quick reference for routine deployments:
PRE-DEPLOY
  [ ] CI green on main
  [ ] Staging tested (if applicable)
  [ ] Backup taken (if migration)
  [ ] Env vars set (if new ones added)

DEPLOY
  [ ] Push to main (auto-deploys both services)
  [ ] If migration: run prisma migrate deploy on production

POST-DEPLOY (within 15 minutes)
  [ ] curl health check
  [ ] Load web in browser
  [ ] Check Sentry for errors
  [ ] Check BetterStack uptime
  [ ] Spot-check affected features