Browse docs (11)

PulseGuardPlus User Guide

2,260 words ยท 11 min read ยท 19 sections

Overview

Uptime monitoring with 6 check types, multi-region coverage, content matching, and automated incident management.

Subdomain: pulse.microgemlabs.ai

Getting Started

1. Enable PulseGuardPlus in the Products page at microgemlabs.ai

2. Navigate to pulse.microgemlabs.ai or click Open on the product card

3. Click + New Monitor to create your first check

Your first monitor runs immediately after creation. Results appear on the dashboard within 60 seconds.

Monitor Types

HTTP/HTTPS

Sends an HTTP request to your URL and validates the response. Defaults to a GET, but any method (POST / PUT / PATCH / DELETE / HEAD) works โ€” useful for API smoke checks like "does our login endpoint return 200 with valid credentials?"

Configuration:
  • Target URL โ€” Any HTTP or HTTPS endpoint (e.g., https://api.example.com/health)
  • Method โ€” GET (default), POST, PUT, PATCH, DELETE, or HEAD
  • Request body *(POST / PUT / PATCH / DELETE only)* โ€” Sent verbatim. JSON, form-encoded, or raw text โ€” pair with the Content-Type field
  • Body Content-Type *(non-GET only)* โ€” Defaults to application/json if you set a body but leave this blank. Override for form-encoded payloads, etc.
  • Expected Status โ€” HTTP status codes to accept (default: 2xx). JSON array for multiple: [200, 201, 204]
  • Keywords + match mode โ€” Strings that must (or must not) appear in the response body
  • Timeout โ€” Milliseconds before the request is considered failed (default: 5000)
  • Custom Headers โ€” JSON object for Authorization, custom Content-Type override, or any other header

What it checks: Method-aware request fires, status code matches expected, response body matches keyword rules, response completes within timeout, no connection errors. Example โ€” POST API smoke check:
URL:           https://api.example.com/auth/login
Method:        POST
Body:          {"email":"smoke-test@example.com","password":"s3cret"}
Content-Type:  application/json
Expected:      [200]
Keywords:      ["accessToken"]

GET / HEAD requests never carry a body even if one is configured (some servers reject this; the spec disallows it).

OAuth 2.0 (client_credentials grant):

For APIs that require a fresh access token rather than a static bearer, expand the OAuth 2.0 section under HTTP options and set:

  • Token URL โ€” the OAuth server's token endpoint (e.g., https://auth.example.com/oauth/token)
  • Client ID + Client Secret โ€” service-account credentials (the secret is encrypted at rest with AES-256-GCM, never echoed back)
  • Scope *(optional)* โ€” space-separated scopes if the API requires them

Before each check, the monitor mints a fresh access token via grant_type=client_credentials, caches it on the row until expiry, and sends it as Authorization: Bearer <token> on the actual request. Tokens are refreshed automatically when the cached one is within 30s of expiry. Editing any OAuth field invalidates the cached token so a new one is minted on the next check.

Token endpoint response handling:

The resolver is tolerant of the common variations real OAuth servers emit:

FieldAccepts
Access tokenaccess_token (RFC 6749) or accessToken (camelCase)
Relative expiryexpires_in or expiresIn (seconds-from-now, number or string)
Absolute expiryexpires_at, expiresAt, or exp (Unix epoch seconds โ€” JWT-style)

If the server provides both relative and absolute expiry, relative wins. If it provides neither, the monitor caches for 1 hour then re-mints. Already-expired absolute timestamps are rejected (treated as "no expiry given") so a misconfigured clock can't lock you into a bad cache.

Only the client_credentials grant is supported โ€” that's the right fit for "the monitor authenticates as a service account." Refresh-token / authorization-code flows require user-side steps that don't fit a background monitor.

TCP Port

Connects to a TCP port and optionally verifies the service banner.

Configuration:
  • Target โ€” Hostname and port (e.g., db.example.com:5432)
  • Banner Match โ€” Optional string that must appear in the service's banner response (e.g., PostgreSQL for a database port)

Use cases: Database ports, Redis, SMTP, FTP, custom TCP services. The banner match confirms the correct service is running, not just that the port is open.

ICMP Ping

Sends ICMP ping packets and measures round-trip latency.

Configuration:
  • Target โ€” Hostname or IP address
  • Timeout โ€” Milliseconds (default: 5000)

Use cases: Server reachability, network device monitoring, baseline latency tracking across regions.

DNS

Resolves DNS records and compares against expected values.

Configuration:
  • Target โ€” Domain name to resolve (e.g., example.com)
  • Record Type โ€” A, AAAA, CNAME, MX, TXT, or NS
  • Expected Value โ€” The expected resolved value (e.g., 93.184.216.34 for an A record)

Use cases: Detect DNS hijacking, propagation failures, accidental record changes, verify MX records for email delivery.

SSL/TLS Certificate

Inspects the SSL certificate and tracks days until expiry.

Configuration:
  • Target โ€” Hostname (e.g., example.com)

Alert thresholds: Alerts at 14, 7, 1, and 0 days before expiry. For more granular SSL monitoring, see CertGuard. What it reports: Certificate validity, issuer, subject, valid-from/to dates, TLS protocol version, days remaining.

Heartbeat (Dead-Man Switch)

Generates a unique ping URL. If your service stops sending pings within the expected interval plus grace period, an alert fires.

Configuration:
  • Heartbeat Grace โ€” Seconds after the expected interval before alerting (default: 3600)
  • Interval โ€” How often your service should ping (set to match your cron schedule)

How to use it:

Add a curl call to the end of your cron job or background task:

curl -fsS https://pulse.microgemlabs.ai/api/heartbeat/YOUR_MONITOR_ID
Use cases: Cron job completion verification, ETL pipeline monitoring, backup job confirmation. For more advanced cron monitoring with start/complete tracking, see CronKeeper.

Content Matching

HTTP monitors support three keyword matching modes for response body validation:

Contains Any (OR logic)

Alert if *none* of the specified keywords appear in the response. Use when the page should contain at least one of several expected strings.

Example keywords: healthy, ok, running

Contains All (AND logic)

Alert if *any* specified keyword is missing. Use when all keywords must be present.

Example keywords: status, uptime, version

Not Contains

Alert if any forbidden keyword is found. Use for detecting error pages or maintenance mode.

Example keywords: error, maintenance, 503

Content Hash (SHA-256)

Every HTTP check computes a SHA-256 hash of the response body. If the hash changes between checks, the dashboard flags it. This detects content changes (defacement, unexpected updates) without requiring specific keywords.

Browser Checks

Synthetic Playwright scenarios run on a schedule via [Browserless](https://www.browserless.io/). Use these for journeys the 6 simpler monitor types can't reach โ€” multi-step flows like "log in, navigate to dashboard, confirm the user widget renders." Lives at pulse.microgemlabs.ai/pulse/browser-checks.

Plan limits

PlanMax scenariosMin interval
Free0 (upgrade required)n/a
Pro515 min
Team205 min

Scenario shape

A browser check stores a JSON scenario document. The default (and recommended) mode is steps โ€” an ordered list of declarative actions the runner translates into Playwright calls. There's also a raw-Playwright escape hatch for cases the step DSL can't express.

Supported step actions:

ActionPurposeRequired fields
gotoNavigate to a URLurl, optional waitUntil (load / domcontentloaded)
fillType a value into an inputselector, value
clickClick an elementselector
pressPress a keyboard key on a focused elementselector, key
waitForWait for a selector to appearselector, optional timeout
waitForUrlWait until the URL matches a patternpattern, optional timeout
expectTextAssert visible text on a selectorselector, text, optional matchMode (contains / exact)

Example โ€” login + dashboard check:

{
  "mode": "steps",
  "steps": [
    { "action": "goto", "url": "https://your-app.com/login", "waitUntil": "load" },
    { "action": "fill", "selector": "#email", "value": "{{SECRET:test_email}}" },
    { "action": "fill", "selector": "#password", "value": "{{SECRET:test_password}}" },
    { "action": "click", "selector": "button[type=submit]" },
    { "action": "waitForUrl", "pattern": "/dashboard", "timeout": 10000 },
    { "action": "expectText", "selector": "[data-testid=username]", "text": "Welcome" }
  ]
}

The runner walks the steps in order. The check fails (and creates an incident) as soon as any step times out or assertion fails โ€” the failed step's index + selector show up in the result row.

Secrets in scenarios

The {{SECRET:name}} placeholder in a step's value field is resolved at run time from your team's Browser scenario secrets vault (see below). Test passwords, API tokens, and OAuth refresh tokens stay encrypted at rest and never appear in scenario JSON or check logs.

Browser Scenario Secrets

Encrypted credential store backing the {{SECRET:name}} placeholders in browser scenarios. Lives at pulse.microgemlabs.ai/pulse/browser-secrets.

  • Encryption โ€” AES-256-GCM envelope encryption with a server-held key. Values are write-only โ€” the UI never echoes a stored secret back to the browser, even to admins.
  • Naming โ€” lowercase letters, digits, and underscores only, max 40 characters. Same name appears in scenario JSON as {{SECRET:my_secret}}.
  • Usage count โ€” each row shows how many scenarios reference the secret, so retiring stale credentials doesn't break a check silently.
  • Rotation โ€” edit the secret in place; the new value takes effect on the next scenario run. No need to re-publish scenarios.

If a scenario references a secret that doesn't exist, the check fails fast with a "missing secret" error rather than silently sending an empty string.

Multi-Region Checking

Every active monitor is checked from three geographic regions simultaneously: US, EU, and Asia. A monitor is only marked as DOWN when it fails from all three regions. A single-region failure is logged but does not trigger an alert โ€” this eliminates false positives from localized network issues.

Response time charts on the dashboard show per-region latency, helping identify geographic performance differences.

Check Intervals

PlanMinimum IntervalNotes
Free5 minutes
Pro1 minute
Team1 minute+ SMS/Slack/Phone alerts

Incidents

When a monitor transitions from UP to DOWN, PulseGuardPlus automatically:

1. Creates an Incident record with timestamp, region, response time, and error details

2. Sends alerts through all configured channels (email, webhook, Slack, SMS, phone via on-call)

3. Begins tracking recovery โ€” when the monitor comes back UP, the incident is auto-resolved

Incident statuses: Investigating โ†’ Identified โ†’ Monitoring โ†’ Resolved

Team members can add manual notes to open incidents, including status transitions. This creates a complete audit trail for post-incident review.

Services

Organize monitors into a hierarchical service tree (up to 3 levels deep). Services provide:

  • Rollup dashboards โ€” see the health of an entire service at a glance (e.g., "API Services: 4/5 monitors UP")
  • Grouped views โ€” filter the dashboard by service to focus on a specific area
  • Status page grouping โ€” public status pages display monitors grouped by service

Example hierarchy:

API Services
  โ”œโ”€โ”€ Auth Service
  โ”œโ”€โ”€ Payment Service
  โ””โ”€โ”€ Search API
Web Frontend
  โ”œโ”€โ”€ Marketing Site
  โ””โ”€โ”€ App Dashboard
Infrastructure
  โ”œโ”€โ”€ Database
  โ””โ”€โ”€ Redis Cache

Public Status Pages

Each team gets a public status page at /status/[team-slug] displaying real-time monitor status, uptime percentages, response time charts, and recent incident history. Status pages require no authentication and are designed to be shared with customers.

Predictive Alerting

When enabled, MicroGemAI extrapolates response time trends using linear regression over 48 hours of data and alerts before thresholds are breached. Example: "Response time for API Gateway increasing at 12ms/hour. Current: 180ms. Predicted to exceed 450ms in 24 hours." Enable in Anomalies โ†’ Detection Settings โ†’ toggle Prediction on. Configurable horizon from 6 hours to 7 days.

Runbook Actions

Define automated remediation actions (Skills โ†’ Runbook (/agent/skills?type=runbook)) that MicroGemAI can suggest or execute when incidents occur. Example: create a "Restart API Workers" template that POSTs to your deploy webhook. Set a trigger pattern like connection refused|timeout for PulseGuardPlus, and when an incident matches, MicroGemAI either suggests the action (manual trust), requests approval (auto-approval), or executes immediately (full auto). Every execution is audited with full HTTP request/response capture.

Postmortems

When an incident is resolved (lasting 5+ minutes), MicroGemAI auto-generates a structured postmortem with executive summary, impact assessment, chronological timeline from all products, root cause analysis, contributing factors, resolution steps, prioritized action items, and lessons learned. Postmortems include which runbook actions were executed during the incident. Review, edit inline, and publish at Ops โ†’ Postmortems.

Messaging

Receive PulseGuardPlus alerts and interact with MicroGemAI via Telegram or Slack. Configure at Settings โ†’ Integrations. Quick commands: status (infrastructure health), incidents (active list), oncall (who's on call), run <name> (execute a runbook). Or ask any question in natural language. Message the bot from a new chat to auto-discover your Chat ID for setup.

On-Call Integration

When a monitor goes DOWN, PulseGuardPlus routes the alert through your team's escalation policy. Configure on-call schedules and escalation levels at Ops โ†’ On-Call (microgemlabs.ai/oncall). See the On-Call section under Ops for schedule and policy management.

Anomaly Detection

Beyond binary up/down checks, MicroGemAI's anomaly detection tracks response time trends for each HTTP/TCP/Ping monitor. If response times drift significantly above the 7-day rolling baseline, an anomaly alert fires before the service actually goes down. This gives you early warning to investigate degradation.

Configure sensitivity per-monitor from the Anomalies page (Ops โ†’ Anomalies). Response time baselines are excluded from computation during active maintenance windows.

Maintenance Windows

During planned deployments or infrastructure changes, suppress PulseGuardPlus alerts by creating a maintenance window (Ops โ†’ Maintenance). Monitoring continues and check results are recorded, but no incidents are created and on-call is not triggered.

You can scope suppression to all products (global), just PulseGuardPlus (product), or a specific monitor (resource). See the MicroGemAI guide for full maintenance window documentation.

API Integration

Heartbeat endpoint:
GET https://pulse.microgemlabs.ai/api/heartbeat/[monitorId]
Webhook alerts send JSON payloads to configured URLs on status transitions:
{
  "event": "monitor.down",
  "monitor": { "name": "API Gateway", "target": "https://api.example.com" },
  "incident": { "id": "...", "startedAt": "2026-04-20T03:00:00Z" },
  "checkResult": { "status": false, "responseTime": null, "error": "Connection timeout" }
}