Deployment Runbook

Overview

Agentix deploys as two services:

Service	Platform	Trigger	Build
Web (Next.js frontend)	Vercel	Push to `main`	`next build`
API (Express + BullMQ workers)	Railway	Push to `main`	Docker build from `Dockerfile`

Both services auto-deploy on push to main. This runbook covers the full deployment lifecycle: pre-deploy checks, deployment process, database migrations, rollback procedures, secret rotation, and post-deploy verification. Related runbooks:

Staging Environment Setup — set up and verify staging before production deploy
Database Backup & Restore — backup procedures before migrations
Redis Persistence — verify Redis state after deploys

1. Pre-Deploy Checklist

Before deploying to production, verify each item:

CI passes on main: Check GitHub Actions — lint, type-check, test, and build must all pass.
If database schema changed: Migration has been tested on staging first (see Staging Runbook, Section 4).
If new environment variables added: Variables are set in both Vercel (for web) and Railway (for API) dashboards before deploy.
Staging verification passed: The change has been deployed and tested on staging (see Staging Runbook, Section 5).
For breaking API changes: Coordinate deploy order — if the API changes break the current web build, deploy API first. If the web depends on a new API endpoint, deploy API first.
Database backup taken (if migration involved): Run a manual backup before applying schema changes (see Database Backup Runbook, Section 2).

2. Vercel Deployment (Frontend)

Auto-Deploy (Default)

Pushing to main triggers a Vercel build and deployment automatically.

Build process:
```
cd apps/web → npm install → npx next build
```
Vercel handles this automatically based on the project’s root directory configuration.
Production URL: https://app.agentix.app (or the configured custom domain).
Build logs: Vercel Dashboard > Project > Deployments tab > click the latest deployment to view build output.
Build failure behavior: If the build fails, Vercel keeps the previous deployment active. There is no downtime. Fix the build error and push again.

Manual Deploy

If auto-deploy is disabled or you need to deploy without pushing:

# Install Vercel CLI
npm install -g vercel

# Deploy to production
vercel --prod

Requires Vercel CLI authentication (vercel login).

Build Configuration

The web app builds using the Next.js build pipeline:

Setting	Value
Framework	Next.js
Root directory	`apps/web`
Build command	`cd ../.. && npm run build` (Turborepo)
Output directory	`.next`
Node.js version	22

3. Railway Deployment (Backend)

Auto-Deploy (Default)

Pushing to main triggers a Railway build from the Dockerfile at the repository root.

Build process:

FROM node:22-slim
# Install dependencies, copy source, generate Prisma client
# See: Dockerfile in repo root

Health check: Railway pings /health after deployment (configured in railway.toml):
```
[deploy]
healthcheckPath = "/health"
restartPolicyType = "on_failure"
restartPolicyMaxRetries = 3
```
The deployment is only promoted to active if the health check passes. If it fails, Railway keeps the previous deployment running.
Build logs: Railway Dashboard > Project > API service > Deployments tab.
Build failure behavior: Railway does not promote failed deployments. The previous healthy deployment continues serving traffic.

Manual Deploy

# Install Railway CLI
npm install -g @railway/cli

# Authenticate
railway login

# Link to project
railway link

# Deploy
railway up

Build Configuration

Setting	Value
Builder	Dockerfile
Dockerfile path	`Dockerfile` (repo root)
Health check	`GET /health`
Restart policy	On failure, max 3 retries
Exposed port	3001

4. Database Migrations

When to Run

Run migrations when any PR adds or modifies files in apps/api/prisma/migrations/. Check with:

git diff --name-only HEAD~1 | grep "prisma/migrations"

Migration Procedure

Step 1: Take a backup Before any migration, create a manual backup (see Database Backup Runbook, Section 2):

pg_dump -Fc -d "$PRODUCTION_DATABASE_URL" > agentix_backup_$(date +%Y%m%d_%H%M%S).dump

Step 2: Run migration on staging FIRST

DATABASE_URL="$STAGING_DATABASE_URL" npx prisma migrate deploy

Verify staging works correctly after the migration. Check the health endpoint, test affected features, and review logs. Step 3: Run migration on production

DATABASE_URL="$PRODUCTION_DATABASE_URL" npx prisma migrate deploy

Step 4: Verify

DATABASE_URL="$PRODUCTION_DATABASE_URL" npx prisma migrate status

Expected: Database schema is up to date!

Important Notes

prisma migrate deploy is safe: it only applies pending migrations and never resets data.
Migrations are applied in order based on the migration directory timestamps.
If a migration fails midway, check the _prisma_migrations table for a failed entry:
```
SELECT * FROM _prisma_migrations WHERE finished_at IS NULL OR rolled_back_at IS NOT NULL;
```
Fix the issue, then re-run prisma migrate deploy.

WARNING: Never use prisma migrate reset or prisma db push in production. These commands can drop tables and delete data. Use only prisma migrate deploy for production databases.

5. Rollback Procedures

Vercel Rollback (Frontend)

Option A: Dashboard (Instant)

Open Vercel Dashboard > Project > Deployments.
Find the previous successful deployment.
Click the ”…” menu > “Promote to Production”.
The rollback takes effect immediately (no rebuild required).

Option B: CLI

vercel rollback

Rolls back to the previous production deployment instantly. Option C: Git revert

git revert HEAD
git push origin main

Triggers a new Vercel build with the reverted code.

Railway Rollback (Backend)

Option A: Dashboard (Redeploy Previous)

Open Railway Dashboard > Project > API service > Deployments.
Find the previous successful deployment.
Click the deployment > “Redeploy”.
Railway rebuilds from the previous commit’s Docker image.

Option B: Git revert

git revert HEAD
git push origin main

Triggers a new Railway build with the reverted code. This is the safest approach as it creates an auditable commit.

Database Rollback

Database rollbacks are the most complex because Prisma does not support down migrations. Scenario 1: Additive migration (new columns, new tables) Rollback is usually not needed. Old application code ignores new columns and tables. Simply roll back the application code and the unused schema remains harmlessly. Scenario 2: Destructive migration (dropped columns, renamed tables) Restore from backup (see Database Backup Runbook):

Do not roll back the application code yet — it may depend on the new schema.

Restore the database from the pre-migration backup:

pg_restore -d "$NEW_DATABASE_URL" --clean --if-exists agentix_backup_YYYYMMDD_HHMMSS.dump

Update DATABASE_URL to point to the restored instance.
Roll back the application code.
Redeploy.

Scenario 3: Create a corrective migration Instead of restoring, create a new migration that undoes the changes:

# Create a new migration that reverses the problematic changes
npx prisma migrate dev --name revert_problematic_change

This is preferred over backup restore when the data changes are small or reversible.

6. Secret Rotation

Rotation Procedures by Secret

Secret	Location	Rotation Steps
`BETTER_AUTH_SECRET`	Railway	1. Generate new secret: `openssl rand -hex 32`. 2. Update in Railway env vars. 3. Redeploy API. 4. Impact: All existing sessions are invalidated — users must re-login.
`OPENAI_API_KEY`	Railway	1. Create new key in OpenAI Dashboard. 2. Update in Railway env vars. 3. Redeploy API. 4. Revoke old key in OpenAI Dashboard.
`CREDENTIAL_ENCRYPTION_KEY`	Railway	WARNING: Rotating this key makes all existing encrypted tool credentials unreadable. 1. Decrypt all credentials with old key. 2. Update env var in Railway. 3. Re-encrypt all credentials with new key. 4. Redeploy API.
`RESEND_API_KEY`	Railway	1. Create new key in Resend Dashboard. 2. Update in Railway env vars. 3. Redeploy API. 4. Revoke old key in Resend.
`SENTRY_DSN`	Railway + Vercel	DSN rarely changes. If needed: 1. Update in both Railway and Vercel env vars. 2. Redeploy both services.
`SENTRY_AUTH_TOKEN`	Vercel	1. Create new token in Sentry. 2. Update in Vercel env vars. 3. Redeploy web. 4. Revoke old token.
`POSTHOG_API_KEY`	Railway + Vercel	1. Rotate in PostHog Project Settings. 2. Update in both Railway and Vercel env vars. 3. Redeploy both services.

General Rotation Procedure

For any secret:

Generate a new credential in the service’s dashboard.
Update the env var in the deployment platform (Railway and/or Vercel).
Redeploy the affected service(s) to pick up the new value.
Verify the service works with the new credential (health check, test request).
Revoke the old credential in the service’s dashboard.

Important: Always generate the new credential before revoking the old one. There will be a brief window where both credentials are valid — this is expected and prevents downtime.

Key Generation Commands

# Generate a new BETTER_AUTH_SECRET
openssl rand -hex 32

# Generate a new CREDENTIAL_ENCRYPTION_KEY
openssl rand -hex 32

# Generate a random webhook verify token
openssl rand -hex 16

7. Post-Deploy Verification

After every production deployment, verify:

API health check passes:

curl https://api.agentix.app/health

Expected response:

{"status":"ok","timestamp":"...","checks":{"db":"ok","redis":"ok"}}

Web loads correctly: Visit https://app.agentix.app and confirm the login page renders without errors.
Sentry: Check the Sentry Dashboard for new errors in the first 15 minutes post-deploy. A spike in errors indicates a regression.
BetterStack: Confirm the uptime monitor shows green for both web and API endpoints.
If migration was run: Spot-check affected data via API requests or Prisma Studio:
```
DATABASE_URL="$PRODUCTION_DATABASE_URL" npx prisma studio
```
If webhook changes: Send a test WhatsApp message and verify it is processed correctly. Check Railway API logs for the webhook event and worker processing.
BullMQ workers: Check Railway API logs for worker startup messages confirming all 3 workers are running (message-processing, broadcast-sending, audit-processing).

8. Emergency Procedures

Site Down After Deploy

Immediately rollback using the fastest method:
- Vercel: Dashboard > Promote previous deployment (instant).
- Railway: Dashboard > Redeploy previous deployment.
Verify the rollback resolved the issue (health check, site load).
Investigate the root cause on the rolled-back commit.
Fix, test on staging, then re-deploy.

Database Migration Failed

Do NOT rollback application code yet — the code may depend on the new schema.

Check the _prisma_migrations table:

SELECT migration_name, started_at, finished_at, rolled_back_at
FROM _prisma_migrations
ORDER BY started_at DESC
LIMIT 5;

If migration partially applied (started but not finished): Restore from the pre-migration backup (see Database Backup Runbook).
If migration failed cleanly (error before any changes): Fix the migration SQL and re-run prisma migrate deploy.
After fixing: redeploy and verify.

Secret Leaked

Immediately rotate the leaked secret (see Section 6).
Check audit logs and access logs for unauthorized access during the exposure window.
If the secret was a BETTER_AUTH_SECRET: all sessions are invalidated on rotation. Users re-login.
If the secret was an OPENAI_API_KEY: check OpenAI usage dashboard for unexpected charges.
If the secret was a CREDENTIAL_ENCRYPTION_KEY: all tool credentials need re-encryption after rotation.
Notify affected users if customer data may have been exposed.
Document the incident: what leaked, exposure window, impact, remediation.

High Error Rate After Deploy (No Outage)

Check Sentry for the new error pattern.
If errors are isolated to a specific feature: consider a targeted fix instead of full rollback.
If errors affect core functionality (auth, webhooks, workflows): rollback immediately.
If errors are transient (connection timeouts, cold starts): monitor for 5 minutes. Railway and Vercel may need time to stabilize after deploy.

9. Deploy Checklist Summary

Quick reference for routine deployments:

PRE-DEPLOY
  [ ] CI green on main
  [ ] Staging tested (if applicable)
  [ ] Backup taken (if migration)
  [ ] Env vars set (if new ones added)

DEPLOY
  [ ] Push to main (auto-deploys both services)
  [ ] If migration: run prisma migrate deploy on production

POST-DEPLOY (within 15 minutes)
  [ ] curl health check
  [ ] Load web in browser
  [ ] Check Sentry for errors
  [ ] Check BetterStack uptime
  [ ] Spot-check affected features

Getting Started

Runbooks

Deployment Runbook

Deployment Runbook

Overview

1. Pre-Deploy Checklist

2. Vercel Deployment (Frontend)

Auto-Deploy (Default)

Manual Deploy

Build Configuration

3. Railway Deployment (Backend)

Auto-Deploy (Default)

Manual Deploy

Build Configuration

4. Database Migrations

When to Run

Migration Procedure

Important Notes

5. Rollback Procedures

Vercel Rollback (Frontend)

Railway Rollback (Backend)

Database Rollback

6. Secret Rotation

Rotation Procedures by Secret

General Rotation Procedure

Key Generation Commands

7. Post-Deploy Verification

8. Emergency Procedures

Site Down After Deploy

Database Migration Failed

Secret Leaked

High Error Rate After Deploy (No Outage)

9. Deploy Checklist Summary

Getting Started

Runbooks

​Deployment Runbook

​Overview

​1. Pre-Deploy Checklist

​2. Vercel Deployment (Frontend)

​Auto-Deploy (Default)

​Manual Deploy

​Build Configuration

​3. Railway Deployment (Backend)

​Auto-Deploy (Default)

​Manual Deploy

​Build Configuration

​4. Database Migrations

​When to Run

​Migration Procedure

​Important Notes

​5. Rollback Procedures

​Vercel Rollback (Frontend)

​Railway Rollback (Backend)

​Database Rollback

​6. Secret Rotation

​Rotation Procedures by Secret

​General Rotation Procedure

​Key Generation Commands

​7. Post-Deploy Verification

​8. Emergency Procedures

​Site Down After Deploy

​Database Migration Failed

​Secret Leaked

​High Error Rate After Deploy (No Outage)

​9. Deploy Checklist Summary

Deployment Runbook

Overview

1. Pre-Deploy Checklist

2. Vercel Deployment (Frontend)

Auto-Deploy (Default)

Manual Deploy

Build Configuration

3. Railway Deployment (Backend)

Auto-Deploy (Default)

Manual Deploy

Build Configuration

4. Database Migrations

When to Run

Migration Procedure

Important Notes

5. Rollback Procedures

Vercel Rollback (Frontend)

Railway Rollback (Backend)

Database Rollback

6. Secret Rotation

Rotation Procedures by Secret

General Rotation Procedure

Key Generation Commands

7. Post-Deploy Verification

8. Emergency Procedures

Site Down After Deploy

Database Migration Failed

Secret Leaked

High Error Rate After Deploy (No Outage)

9. Deploy Checklist Summary