Uptime Monitoring Setup — BetterStack External Health Checks

1. Overview

Current setup: The Agentix API exposes a /health endpoint that verifies database (PostgreSQL) and Redis connectivity, returning HTTP 200 when healthy and HTTP 503 when either dependency is down. Target: External uptime monitoring via BetterStack that pings /health every 60 seconds and alerts the team via email (and optionally Slack) when the service is degraded. Why this matters:

Internal health checks only help if the server is reachable — external monitoring catches network, DNS, and infrastructure failures
60-second intervals ensure issues are detected within 1-2 minutes
Automated alerts reduce mean-time-to-detection (MTTD) from hours to minutes
A public status page builds trust with tenants

What the /health endpoint checks:

PostgreSQL connectivity (runs a lightweight query)
Redis connectivity (sends a PING command)
Returns 200 OK with { "status": "healthy" } if both pass
Returns 503 Service Unavailable with { "status": "unhealthy", "details": {...} } if either fails

2. Prerequisites

BetterStack account — sign up at https://betterstack.com (free tier includes 5 monitors with 3-minute checks; upgrade for 60-second intervals)
Production API URL (e.g., https://api.agentix.app)
Team email addresses for alert recipients

3. Step 1 — Create a Monitor

Sign into the BetterStack dashboard
Navigate to Monitors in the left sidebar
Click Create Monitor
Configure the monitor settings:

Setting	Value
Monitor type	HTTP(s)
URL	`https://api.agentix.app/health` (substitute your actual production URL)
Check frequency	Every 60 seconds
Request method	GET
Expected status code	200
Confirmation period	2 checks (waits for 2 consecutive failures before alerting — avoids false alarms on transient blips)
Request timeout	10 seconds
Monitor name	`Agentix API — Health` (or any descriptive name)

Click Save to create the monitor

Note: The free tier limits check frequency to 3 minutes. For 60-second checks, the Freelancer plan ($16.67/mo billed annually) or higher is required. The 3-minute free tier is still useful for basic coverage.

4. Step 2 — Configure Email Alerts

BetterStack sends alerts to people added to your escalation policy.

Navigate to On-call > People in the left sidebar
Click Invite team member and add each recipient’s email address
Navigate to On-call > Escalation policies
Edit the default escalation policy (or create a new one):
- Step 1: Notify the team immediately on incident creation
- Add all relevant team members
Return to your monitor and verify the escalation policy is assigned

Test email delivery:

BetterStack sends a welcome email when you invite team members
If no welcome email arrives, check spam/junk folders and verify the email address

5. Step 3 — Configure Slack Alerts (Optional)

For faster response times, add Slack notifications alongside email.

Navigate to Integrations in the left sidebar
Find Slack and click Connect
Authorize BetterStack to post to your Slack workspace
Select the channel for alerts (e.g., #ops-alerts or #engineering)
Return to On-call > Escalation policies
Add a Slack notification step to your escalation policy:
- Step 1: Notify via Slack channel immediately
- Step 2: Notify team members via email (if not acknowledged within 5 minutes)

6. Step 4 — Create a Status Page (Optional)

A public status page communicates uptime to tenants without them needing to contact support.

Navigate to Status pages in the left sidebar
Click Create status page
Configure:
- Name: Agentix Status
- Subdomain: status.agentix.app (or use BetterStack’s default subdomain)
- Resources: Add the Agentix API — Health monitor
Click Save
Share the status page URL with tenants or link it from the product

Custom domain (optional):

Add a CNAME record in your DNS pointing status.agentix.app to BetterStack’s status page domain
Configure the custom domain in BetterStack’s status page settings

7. Verification

After creating the monitor, verify everything is working:

Wait 2-3 minutes for the first few checks to complete
In the BetterStack dashboard, confirm the monitor shows Up status with a green indicator
Check that the response time graph is populating

Test alerting end-to-end:

Temporarily change the monitor URL to a non-existent path (e.g., https://api.agentix.app/health-test-invalid)
Wait for 2 check cycles (2-3 minutes depending on your interval)
Confirm an alert email arrives (check spam if not in inbox)
Confirm Slack notification arrives (if configured)
Immediately revert the monitor URL back to https://api.agentix.app/health
Confirm the monitor recovers and shows Up status
Confirm a recovery notification is sent

8. Verification Checklist

Monitor exists in BetterStack dashboard with Up status
Check frequency is set to 60 seconds (or 3 minutes on free tier)
Expected status code is 200
Confirmation period is 2 checks
Request timeout is 10 seconds
At least one team member is configured in the escalation policy
Test alert was received via email
(Optional) Slack integration is connected and test alert received
(Optional) Status page is created and accessible

9. Troubleshooting

Monitor shows “Down” but the app works in browser

CORS or auth blocking: The /health endpoint should not require authentication or set CORS restrictions. Verify by running:
```
curl -s -o /dev/null -w "%{http_code}" https://api.agentix.app/health
```
Expected output: 200
Firewall or WAF: If using Cloudflare or another WAF, ensure BetterStack’s IP ranges are not blocked. BetterStack publishes their monitoring IP ranges in their documentation.
DNS resolution: The monitor URL must be publicly resolvable. If the API is behind a private network, external monitoring cannot reach it.

Alerts not arriving

Email: Check spam/junk folders. Verify the email address in On-call > People. Ensure the escalation policy is assigned to the monitor.
Slack: Verify the Slack integration is still authorized (tokens can expire). Reconnect if needed.
Escalation policy: Ensure the policy has at least one active step with team members assigned.

False alarms (intermittent “Down” alerts)

Increase the confirmation period from 2 to 3 checks
Increase the request timeout from 10 to 15 seconds
Check if the API has cold-start latency (Railway sleeps inactive services on some plans)

Health endpoint returns 503

The /health endpoint returns 503 when PostgreSQL or Redis is unreachable. This is a real issue that requires investigation:

Check Railway dashboard for database/Redis service status
Check PostgreSQL connection limits (max_connections)
Check Redis memory usage and eviction policy
Review API logs in Railway for connection errors

10. Ongoing Maintenance

Review monthly: Check the uptime percentage in BetterStack dashboard. Aim for 99.9%+ uptime.
DMARC upgrade path: None needed — BetterStack alerts come from BetterStack’s own domain.
Escalation policy updates: When team members join or leave, update the escalation policy in On-call > People.
Monitor updates: If the API URL changes (e.g., domain migration), update the monitor URL immediately.

References

BetterStack Uptime Documentation
BetterStack Monitoring IP Ranges
BetterStack Status Pages
Code reference: apps/api/src/routes/health.ts (health endpoint implementation)
Code reference: apps/api/src/index.ts (health route registration)

Getting Started

Runbooks

Uptime Monitoring Setup

Uptime Monitoring Setup — BetterStack External Health Checks

1. Overview

2. Prerequisites

3. Step 1 — Create a Monitor

4. Step 2 — Configure Email Alerts

5. Step 3 — Configure Slack Alerts (Optional)

6. Step 4 — Create a Status Page (Optional)

7. Verification

8. Verification Checklist

9. Troubleshooting

Monitor shows “Down” but the app works in browser

Alerts not arriving

False alarms (intermittent “Down” alerts)

Health endpoint returns 503

10. Ongoing Maintenance

References

Getting Started

Runbooks

​Uptime Monitoring Setup — BetterStack External Health Checks

​1. Overview

​2. Prerequisites

​3. Step 1 — Create a Monitor

​4. Step 2 — Configure Email Alerts

​5. Step 3 — Configure Slack Alerts (Optional)

​6. Step 4 — Create a Status Page (Optional)

​7. Verification

​8. Verification Checklist

​9. Troubleshooting

​Monitor shows “Down” but the app works in browser

​Alerts not arriving

​False alarms (intermittent “Down” alerts)

​Health endpoint returns 503

​10. Ongoing Maintenance

​References

Uptime Monitoring Setup — BetterStack External Health Checks

1. Overview

2. Prerequisites

3. Step 1 — Create a Monitor

4. Step 2 — Configure Email Alerts

5. Step 3 — Configure Slack Alerts (Optional)

6. Step 4 — Create a Status Page (Optional)

7. Verification

8. Verification Checklist

9. Troubleshooting

Monitor shows “Down” but the app works in browser

Alerts not arriving

False alarms (intermittent “Down” alerts)

Health endpoint returns 503

10. Ongoing Maintenance

References