Overview
MicroGemLabs exposes a read-only MCP server that lets any MCP-compatible AI agent query your monitoring, logging, and alerting data. This means agents like Hermes Agent, Claude, Cursor, or your own custom agent can access your infrastructure data through a standard protocol.
The MCP server provides 15 tools covering all six MicroGemLabs products plus the MicroGemAI memory system.
Authentication
All MCP tool calls require a team API key. Generate one in Settings โ Account โ API Keys.
The API key is passed as a Bearer token in the MCP server connection:
Authorization: Bearer mgl_team_xxxxxxxxxxxxx
API keys are scoped to a single team. All data returned is limited to that team's resources.
Connecting Your Agent
Hermes Agent
Add to your Hermes Agent MCP configuration:
# ~/.hermes/mcp_servers.yaml
servers:
- name: microgemlabs
url: https://mcp.microgemlabs.ai
auth:
type: bearer
token: mgl_team_xxxxxxxxxxxxx
Claude Desktop / Claude Code
Add to your MCP server configuration:
{
"mcpServers": {
"microgemlabs": {
"type": "url",
"url": "https://mcp.microgemlabs.ai/sse",
"headers": {
"Authorization": "Bearer mgl_team_xxxxxxxxxxxxx"
}
}
}
}
Custom Agents
Any agent that supports the MCP protocol can connect. The server endpoint is:
https://mcp.microgemlabs.ai/sseAvailable Tools
query_monitors
Get the current status of all uptime monitors (PulseGuardPlus).
Parameters:| Param | Type | Description |
|---|---|---|
status | string | Filter: up, down, or all (default) |
type | string | Filter: http, tcp, ping, dns, ssl, heartbeat |
query_incidents
Get recent incidents across ALL products โ PulseGuardPlus downtime, CronKeeper missed pings, LogVault alerts, CertGuard expiry warnings, CronRunner failures, HookRelay forward failures.
Parameters:| Param | Type | Description |
|---|---|---|
product | string | Filter by product or all (default) |
status | string | active, resolved, or all (default) |
hours | number | Look-back period (default: 24) |
search_logs
Search log entries in LogVault with full-text search.
Parameters:| Param | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Full-text search query |
level | string | No | debug, info, warn, error, fatal |
stream | string | No | Stream name filter |
hours | number | No | Look-back period (default: 1) |
limit | number | No | Max entries (default: 20) |
query_cron_checks
Get cron job monitor status from CronKeeper.
Parameters:| Param | Type | Description |
|---|---|---|
status | string | up, late, down, or all (default) |
query_certs
Get SSL certificate and domain registration status from CertGuard.
Parameters:| Param | Type | Description |
|---|---|---|
status | string | healthy, warning, critical, expired, or all |
expiring_within_days | number | Only return certs expiring within N days |
query_scheduled_jobs
Get scheduled HTTP job status from CronRunner.
Parameters:| Param | Type | Description |
|---|---|---|
status | string | success, failed, or all (default) |
query_webhooks
Get webhook endpoint status from HookRelay.
Parameters:| Param | Type | Description |
|---|---|---|
with_failures | boolean | Only return endpoints with failures (default: false) |
get_oncall_status
Get current on-call schedule status.
Parameters: None Returns: Who is currently on call, active alerts, escalation policy status. Example prompt: "Who is on call right now and are there any active alerts?"get_platform_health
Comprehensive health overview of ALL products in one call. This is the most useful tool for general "how are things?" queries.
Parameters:| Param | Type | Description |
|---|---|---|
hours | number | Look-back period (default: 24) |
search_memory
Search MicroGemAI's accumulated knowledge base.
Parameters:| Param | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Search query |
category | string | No | Filter: infrastructure, pattern, runbook, preference, incident_learning, team_context, or all |
correlate_events
Find events across ALL products that occurred around a specific time. The most powerful tool for incident investigation.
Parameters:| Param | Type | Required | Description |
|---|---|---|---|
timestamp | ISO 8601 | Yes | Center of the search window |
window_minutes | number | No | Minutes before and after (default: 30) |
Example Agent Sessions
Incident Investigation with Hermes Agent
You: What caused the API outage at 3 AM?
Hermes Agent:
โ [calls get_platform_health]
โ [calls query_incidents with hours=12]
โ [calls correlate_events with timestamp=2026-04-20T03:00:00Z]
โ [calls search_logs with query="error" hours=2]
"Based on the data, here's what happened:
The API gateway monitor went down at 2:58 AM. LogVault shows a spike of
'SSL handshake failed' errors starting at 2:45 AM. CertGuard reports that
the api.example.com certificate expired at 2:30 AM. CronKeeper shows the
cert-renewal cron job has been in 'late' status for 3 days.
Root cause: The SSL certificate renewal cron job stopped running, the cert
expired, and the API started rejecting HTTPS connections.
Fix: Renew the cert manually, then investigate why the renewal cron failed."
Proactive Check with Claude
You: @microgemlabs Are there any issues I should know about?
Claude:
โ [calls get_platform_health]
โ [calls query_certs with expiring_within_days=14]
"Your infrastructure looks mostly healthy. Two items to note:
1. The staging.example.com SSL cert expires in 9 days. Consider
renewing early.
2. CronRunner shows the 'analytics-sync' job has 2 consecutive
failures. The last error was 'Connection refused' โ your analytics
endpoint may be down."Action Tools
These tools allow agents to interact with the runbook system (now part of the unified Skills library โ see /agent/skills?type=runbook). Trust levels apply โ see Runbook Actions in the MicroGemAI guide.
list_runbooks
List available runbook action templates for this team.
Parameters:categoryโ Filter: restart, scale, cache, rollback, dns, custom, or all (default)active_onlyโ Boolean, default true
execute_runbook
Trigger a runbook action. Trust levels apply: manual templates execute immediately when called by an authenticated agent, auto_approval templates return a pending status with approval link, full_auto templates execute immediately.
Parameters:template_id(required) โ ActionTemplate ID from list_runbooksincident_idโ Link execution to an incidentincident_sourceโ Product that created the incidentvariablesโ Key-value pairs for {{variable}} substitutionreasoningโ Why you're executing this (stored in audit trail)
check_execution_status
Poll for the result of a runbook execution.
Parameters:execution_id(required) โ ActionExecution ID from execute_runbook
suggest_runbook
Find matching runbooks for an incident description. Uses trigger pattern matching and past execution history.
Parameters:product(required) โ Where the incident occurredincident_title(required) โ Description of the incidenterror_messageโ Error message for better matching
Example: Full Incident Investigation + Auto-Fix
You: My API is down, check MicroGemLabs and fix it.
Agent:
โ [calls get_platform_health]
โ [calls query_incidents with product=pulseguardplus, status=active]
โ [calls search_logs with query="error" hours=1]
โ [calls correlate_events with timestamp of the incident]
โ [calls suggest_runbook with product=pulseguardplus, incident_title="API Gateway DOWN"]
โ [calls execute_runbook with template_id of "Restart API Workers"]
โ [calls check_execution_status to verify success]
โ [calls query_monitors to confirm recovery]
"The API Gateway went down at 3:02 AM due to connection pool exhaustion.
LogVault showed 'max connections reached' errors starting at 2:45 AM. I
executed the 'Restart API Workers' runbook โ workers restarted in 3.2s
and the API is now responding normally (avg 142ms)."Rate Limits
MCP tool calls are rate-limited to 60 requests per minute per team. For bulk queries, use get_platform_health (one call for everything) instead of querying each product individually.
Data Scope
All tools return data scoped to the authenticated team. You cannot access other teams' data. Tools are read-only โ no tool can create, modify, or delete resources. Use the MicroGemLabs dashboard for write operations.