Browse docs (11)

MicroGemAgent User Guide

4,677 words ยท 23 min read ยท 20 sections

Overview

Autonomous AI DevOps engineer. Watches your infrastructure on a loop, investigates issues with an LLM, proposes fixes for your approval, and learns from every incident it resolves.

Location: microgemlabs.ai/agent

MicroGemAI vs MicroGemAgent

Two different things with similar names:

MicroGemAIMicroGemAgent
What it isConversational assistantAutonomous engineer
How you use itYou ask, it answersRuns on its own, pings you
When it runsWhen you open a chatEvery 5 min + on incidents
Asks permission?N/AYes โ€” proposes fixes for approval
Included in platform?Yes, BYOK LLM keyPremium add-on (three tiers)

MicroGemAI is included with any MicroGemLabs plan. MicroGemAgent is a paid add-on for teams that want autonomous remediation, not just analysis.

Getting Started

1. Click Agent in the top nav, or go directly to microgemlabs.ai/agent

2. Click Enable MicroGemAgent on the setup screen

3. Follow the 4-step checklist that appears on the dashboard:

- Choose a tier

- Add an LLM API key

- Connect at least one integration

- Activate the agent

The checklist dismisses once all four are done, or can be hidden manually.

Tiers

TierPriceIntegrationsFix Proposals
Starter$29/moMCP onlyAnalysis-only โ€” no fixes
Pro$79/moMCP, SSH, AWS, GitHub, Docker, PostgreSQLYes โ€” with dashboard approval
Team$149/moAll Pro features + multi-environment, shared skillsYes โ€” priority scan ordering

Change tier from Agent Settings โ†’ Plan. Only the team owner can change the tier; admins and members see the selector but their clicks are rejected server-side.

Cancelling

Scroll to the bottom of the Plan section in settings and click the small Cancel subscription link. The agent immediately flips off (tier โ†’ None, active โ†’ false). When real billing is wired, the click also routes you through Stripe Customer Portal to cancel the paid side.

Cancellation preserves your credentials, skill history, and session logs โ€” you can re-subscribe later without losing them.

Session Types

Every time the agent runs, it creates a session. Three flavors:

TypeTriggerDefault Interval
WatchdogFast sweep across all integrationsEvery 5 min
Deep ScanDetailed inspection โ€” metrics, logs, dependenciesEvery 60 min
IncidentCross-product anomaly / alert firesEvent-driven

Click any session row on the dashboard to drill in. The detail page shows the summary, any proposal that came out of it, the LLM conversation timeline, fixes executed, and collapsible raw health + analysis JSON.

Session Outcomes

  • Healthy โ€” Nothing to act on.
  • Warning โ€” Anomaly detected but below fix threshold.
  • Critical โ€” Fix proposal raised, approval required.
  • Error โ€” The agent itself failed (credential rotated, LLM down, etc.).

Integrations

MCP is always on (it's how the agent reads from MicroGemLabs itself). The rest require credentials and, except SSH, require Pro tier or higher.

IntegrationWhat it doesTier gate
MCPReads monitor / log / cert / cron / webhook stateStarter+
SSHPM2 ops, system stats, Docker, PG via SSHStarter+
AWSCloudWatch metrics, EC2 reboot, ECS deploy, Lambda configPro+
GitHubRead deploys, rerun workflows, roll back commitsPro+
DockerContainer stats, logs, restart, composePro+
PostgreSQLConnection + lock + query stats, terminate sessionsPro+

Enable / disable at Settings โ†’ Integrations. A disabled integration never runs, even if credentials are present. Enabling an integration without credentials is allowed but the agent will skip it at scan time.

Connections (Credentials)

At Settings โ†’ Connections. Each integration is a single collapsible card with all its fields grouped together. Fill in what the agent needs, click Save connection, and everything is encrypted together.

SSH

  • Host(s) โ€” comma-separated host:port pairs
  • Username โ€” same user across all hosts
  • Private key โ€” pasted SSH private key (OpenSSH or RSA format)

AWS

  • Access Key ID
  • Secret Access Key
  • Region (e.g. us-east-1)

GitHub

  • Personal Access Token (scopes: repo, actions:write)
  • Owner, Repo, Deploy Workflow filename

PostgreSQL

  • Connection string โ€” postgresql://user:pass@host:5432/db
  • Database name

Cloudflare

  • API token โ€” scoped to the zone you want to manage. Required scope: Zone โ†’ Cache Purge โ†’ Edit. Create at <https://dash.cloudflare.com/profile/api-tokens>.
  • Zone ID โ€” the zone the agent purges. Found in the Cloudflare dashboard's right sidebar on the zone overview page.

Used by the cloudflare_cache_purge fix action.

Kubernetes (no separate credential)

The agent runs kubectl rollout restart over your existing SSH connection โ€” there's no Kubernetes-specific credential. Setup requirements on the SSH host:

  • kubectl installed and on PATH
  • KUBECONFIG set (or ~/.kube/config present) for the SSH user, pointing at the cluster you want the agent to manage
  • The kubeconfig context's user must have RBAC permission for rollouts/restart on the target namespaces

To test from the SSH host: kubectl rollout restart deployment/<name> -n <namespace> should succeed without prompting.

Redis (no separate credential)

The agent runs redis-cli over SSH. Setup on the SSH host:

  • redis-cli installed and on PATH
  • The host can reach your Redis instance over the network
  • For password-protected Redis, the agent passes the password via REDISCLI_AUTH environment variable (so it never appears in ps aux); pass password in pinnedData when proposing the fix

If your Redis runs in a Docker container on the SSH host, the simplest pattern is docker exec <redis-container> redis-cli ... โ€” wrap that in a custom skill rather than using the built-in redis_del_key action.

systemd (no separate credential)

The systemctl_restart action runs sudo -n systemctl restart <unit> over SSH. The SSH user must have passwordless sudo for systemctl. Add to /etc/sudoers.d/microgemagent:

agent-user ALL=(root) NOPASSWD: /bin/systemctl restart *

Replace agent-user with whatever username matches your SSH credential. The action's error message will surface "sudo password required" or "user not allowed" if NOPASSWD isn't configured correctly.

Security

Credentials are encrypted with AES-256-GCM using a server-held AGENT_ENCRYPTION_KEY. They're only decrypted into memory during the agent's own scan cycles โ€” never rendered to any UI, never logged, never sent to your LLM provider.

The decrypt function is deliberately scoped to background code only. Even a stolen user session cannot dump credentials via the app.

Editing: leave a field blank to keep its current value; only non-empty fields overwrite. Disconnecting: the red Disconnect button wipes every credential for that integration. The associated toggle in Integrations auto-flips off.

Fix Proposals

When the agent detects an issue it can plausibly fix, it raises a proposal: a pending notification describing the action, target, and reasoning. Your team approves or denies from the dashboard.

How Approval Works

1. Agent session ends with a proposal and creates an AgentNotification with status pending.

2. Optional doorbell ping fires to any connected Telegram / Slack channel. The ping contains *only* the title and a dashboard link โ€” no IPs, no PIDs, no query data.

3. The notification appears in:

- The bell icon in the hub header (polled every 30s)

- A full-width amber banner at the top of every hub page, for critical severity only

- Click through to the dashboard to see full details

4. An owner or admin clicks Approve or Deny. Members cannot (server rejects with 403).

5. On approve, the agent executes the fix and records it as an AgentFix linked to the session.

6. Proposals expire after 15 minutes if untouched โ€” agent moves on, no partial execution.

Fix mode is Pro-tier only. Starter tier sees proposals in the session detail but no approval UI.

Safety Rails

  • Cooldown โ€” minimum time between consecutive fixes (5โ€“120 min, default 30).
  • Daily cap โ€” maximum fixes per 24 hours (1โ€“50, default 10).
  • Credential scope โ€” fixes only touch resources the agent has credentials for.
  • Audit โ€” every proposal, approval, denial, and execution is logged in AgentFix with timestamp and actor.

Tune at Settings โ†’ Fix Mode.

Fix Actions Reference

Every action goes through the same proposal โ†’ approve โ†’ execute flow. The target shape and any pinnedData fields are listed per action below. All actions in this table are classified destructive โ€” first execution always requires explicit user approval, regardless of skill trustLevel.

ActionTierTarget shapeNotes
docker_restartPro+host:container or bare containerSSH-based
docker_pull_restartPro+host:container or bare containerPulls image first
docker_compose_upPro+/opt/path/to/projectSSH-based, path must be under /opt/...
kubectl_rollout_restartPro+namespace/name or namespace/<deployment\daemonset\statefulset>/<name>SSH host needs kubectl + KUBECONFIG
kubectl_scale_deploymentPro+namespace/deployment-namepinnedData.replicas (0โ€“1000) required. Result includes before/after replica count
pg_cancel_queryPro+PostgreSQL backend PIDRequires PG connection. Pinned at proposal time
pg_terminate_queryPro+PostgreSQL backend PIDRequires PG connection. Pinned at proposal time
redis_del_keyPro+Exact Redis key (no globs)SSH host needs redis-cli
redis_flushdbPro+DB number 0โ€“15Flushes whole logical DB; FLUSHALL not exposed
systemctl_restartPro+systemd unit nameSSH user needs NOPASSWD for systemctl
cloudflare_cache_purgePro+Comma-separated URL list (1โ€“30) OR everythingNeeds Cloudflare credential
pm2_restartPro+host:processRestart a single PM2 process
pm2_restart_allPro+hostnameRestart every PM2 process on a host
github_rerun_workflowPro+numeric workflow run idRe-runs an existing failed CI run
github_workflow_dispatchPro+workflow filename (deploy.yml) or numeric workflow idTriggers a NEW run with caller-supplied inputs. Optional pinnedData.ref (defaults to default branch), pinnedData.inputs (workflow input map)
github_rollbackTeamliteral previousSHA snapshotted at proposal time
ec2_rebootTeamEC2 instance id (i-...)Allowlist enforced via aws_resources.ec2Instances
ecs_force_deployTeamcluster/service or bare serviceAllowlist enforced
aws_lambda_updateTeamfunction-name:aliaspinnedData.targetVersion required. Allowlist via aws_resources.lambdaFunctions
aws_rds_rebootTeamRDS instance identifierOptional pinnedData.forceFailover triggers Multi-AZ swap. Allowlist via aws_resources.rdsInstances
health_check_urlPro+URLRead-only HTTP probe (non-destructive but listed for completeness)
Tier note: Starter tier sees proposals in session detail but cannot approve / execute fixes. Team tier unlocks GitHub rollback, EC2 reboot, ECS force-deploy, AWS Lambda alias re-pointing, and AWS RDS reboot โ€” actions whose blast radius warrants a higher tier.

Skills Library

A unified library at microgemlabs.ai/agent/skills that holds three kinds of reusable agent capability:

TypeWhat it isTrust path
KnowledgeMarkdown playbooks the agent loads as context when a trigger keyword matches ("connection pool exhausted โ†’ run these queries first")Validated by default โ€” knowledge can't execute
ExecutableSandboxed JavaScript that calls helper functions (helpers.postgres.query, helpers.aws.ec2.reboot, etc.)Starts in quarantine โ†’ auto-promotes to validated after 5 successful runs
RunbookTrigger-pattern + action binding for incident response (e.g., regex-match an alert title โ†’ call pm2_restart on a target host)Same quarantine โ†’ validated path; circuit breaker trips after consecutive failures

Skills accumulate three ways:

  • Generated by the agent after a successful fix
  • Authored manually in the UI (knowledge / runbook forms; executable scripts get a Monaco-editor surface with full JS syntax highlighting)
  • Imported from another team's JSON export

Library UI

  • List view (/agent/skills) โ€” filter by type, category, trust level; full-text search across name + content + script + pattern + actionType + category (Postgres tsvector under the hood, so "queries" matches "query")
  • Detail view โ€” stats (total runs, success rate, avg duration), execution history, Edit / Export buttons, and a Run button that streams live progress (SSE)
  • Run button โ€” for executable + runbook skills. Streams a tile of helper calls and console.log output as the script runs; final card shows success/failure + duration + helper-call count. Destructive skills require approval mid-run (proposal raised, dashboard click to continue).
  • Import / Export โ€” bulk and per-skill JSON. The export envelope is bundle-tagged + schema-versioned; importing on a different team replays through the validator and skips name/type collisions.

Trust + approval

Three trust levels gate execution:

  • quarantine โ€” every run requires approval
  • validated โ€” auto-runs unless the team's approval mode is strict
  • trusted โ€” auto-runs even in destructive cases (only set for skills you genuinely vetted)

Tune the team-wide policy at Settings โ†’ Skill Execution: choose between strict (everything needs approval), standard (validated auto-runs, destructive needs approval), or trusted (only quarantined skills need approval).

Sharing community skills (shared: true) is opt-in per skill and visible to other teams โ€” off by default.

Recursive Self-Improvement

Every incident the agent resolves makes the next one easier. The loop is closed end-to-end โ€” what the agent learns at 3 AM today shows up in the LLM prompt for tomorrow's similar incident.

The closed loop

1. Postmortem-driven skill drafts โ€” when an incident resolves and a postmortem is generated, the agent auto-drafts a knowledge skill at quarantine tier capturing the root-cause + resolution. You review at /agent/skills?trust=quarantine and promote what's worth keeping.

2. Vector embeddings on every memory + skill โ€” text-embedding-3-small (1536-dim) embeddings are computed at write time and stored in pgvector. No separate embedding key needed; the agent uses the same OpenAI-compatible endpoint as your chat key, falling back to a hash-based pseudo-vector if the provider doesn't support embeddings.

3. Semantic retrieval at investigation time โ€” when an incident fires, the agent computes a query embedding from the alert title + symptoms and pulls the top-K most similar memories and skills (ivfflat index, cosine similarity). These are injected into the LLM prompt as RELEVANT_PAST_INCIDENTS and RELEVANT_PLAYBOOKS blocks.

4. Citation tracking โ€” every memory and skill the LLM sees creates a MemoryCitation row stamped with the session ID. When the incident resolves, outcomes are written back: helpful / unhelpful / not_used based on whether the LLM referenced the citation in its analysis and whether the proposal was approved.

5. Confidence adjustment โ€” citation outcomes feed back into the memory's confidence score (0.0โ€“1.0). Frequently-cited memories rise; never-cited memories decay.

6. Weekly consolidation cron โ€” every Monday a job clusters similar incident_learning memories into "recurring pattern" memos (e.g., "Redis OOM has caused 4 outages in the last 60 days โ€” common trigger is unbounded list growth"). Pattern memos become first-class memories the agent can cite.

7. Staleness demotion โ€” skills retrieved by semantic search but never executed (i.e. the LLM read them but the proposal flow didn't fire them) auto-demote out of validated tier after a configurable inactivity window. Stops dead skills from polluting future retrievals.

Observability โ€” /agent/memory

The full loop is visible at microgemlabs.ai/agent/memory (top-level nav under Agent):

  • Pool stats โ€” count of memories + skills by tier, embedding coverage %, average confidence
  • Top citations โ€” which memories and skills got cited most in the last 30 days, with helpful / unhelpful win-rate
  • Recent consolidations โ€” pattern memos generated by the weekly cron, with the source incident-learning memories collapsed into each
  • Stale candidates โ€” skills flagged for demotion (retrieved but not executed in N days). One-click "keep" defers demotion another month
  • Missed recommendations โ€” incidents where the agent's retrieved memories suggested a known fix but the proposal wasn't approved (or was denied). Useful for tuning approval policy or spotting brittle skills

Citations are also visible inline on every session detail page โ€” each retrieved memory is stamped with how it was used.

Internet-Aware Investigation

Beyond your own data, the agent ships three external research tools the LLM can call mid-investigation:

ToolWhat it doesNotes
web_searchGeneral web search via TavilyUseful for unknown error messages, recent CVEs, vendor incident threads. Configure TAVILY_API_KEY in env.
fetch_urlRead a specific page (docs, changelog, status)SSRF-guarded โ€” RFC1918 / .internal / .local blocked. Only the first ~100KB is returned to keep the LLM context bounded.
check_third_party_statusFast in/out check against 14 provider status pagesGitHub, AWS, Stripe, OpenAI, Anthropic, Cloudflare, Vercel, Supabase, Datadog, Sentry, Slack, Discord, PagerDuty, Atlassian. Read-only and cached for 60s โ€” calling once across multiple sessions is cheap.

These tools are also available in MicroGemAI gemai mode (read-only chat) โ€” see the MicroGemAI guide.

Agent Chat

A slide-out chat panel for ad-hoc questions and tool-driven investigation. Click the โœจ icon in the hub header (or โŒ˜K if it's wired in your shortcuts).

Modes

The same panel powers two flavors, gated by tier:

ModeEnabled whenTools
Gemai modeAny team with an LLM API keyRead-only investigation surface โ€” search_docs, search_memory, expand_memory, search_skills, gather_health, query_mcp, recall_recent_alerts, list_postmortems, get_postmortem, find_similar_past_incidents, search_past_sessions, web_search, fetch_url, check_third_party_status
Agent modePaid agent tier (Starter+)Everything in gemai mode plus propose_fix, execute_skill, notify_team, prefill_form_link, update_skill, create_knowledge_skill, create_runbook

A team that upgrades mid-thread keeps its conversation history visible โ€” old messages stamped with the kind they were sent under.

Tool reference (agent-mode-only)

ToolWhat it does
propose_fixRaise a fix proposal for any of the 21 fix actions. Includes target + optional pinnedData. Hits the same approval flow as agent-driven proposals.
execute_skillRun an executable or runbook skill by name. Streams progress back into the chat; mid-run approval gates pause and wait for a dashboard click for destructive skills.
notify_teamSend a one-shot notification through the team's on-call channels (email / SMS / Slack / Telegram / voice) without raising an incident. Good for "FYI, deploy in 5 min" or "I rolled back X โ€” ack if any issues."
prefill_form_linkGenerate a deep link into the relevant create form (/pulse/new, /cron/checks/new, /hooks/new, etc.) with the user's stated values pre-populated as query params. Always preferred over inline JSON for any create flow โ€” the user reviews + clicks Save.
update_skillEdit an existing skill โ€” content, trust level, trigger pattern. Useful for "promote that runbook out of quarantine" or "tighten the regex on the payment-cron runbook."
create_knowledge_skill / create_runbookAuthor new skills directly from chat. Defaults to quarantine tier so a human reviews before it auto-fires.

Panel features

  • Markdown rendering in assistant responses โ€” fenced code blocks render as styled tiles with a language label and a copy button; inline code gets a colored pill; internal links (/skills/...) use the Next router so the chat panel stays open across navigation, external links open in a new tab
  • Voice input โ€” mic button (Pro+ projects with the gpt-4o-mini-transcribe model enabled). Press once to start, again to stop, OR walk away โ€” voice activity detection auto-stops after ~1.5s of silence and submits.
  • โ†‘ / โ†“ history โ€” terminal-style recall of your prior messages. Persisted in localStorage so it spans conversations on this device. Up to 100 entries, consecutive duplicates skipped.
  • Expand toggle โ€” header button to take the panel full-screen; choice persists across reloads.
  • Drag to resize โ€” left edge of the panel (desktop only). Width capped at 90% of viewport, persisted to localStorage.
  • Smart auto-scroll โ€” only follows new tokens when you're already at the bottom; if you scrolled up to read history a "โ†“ Latest" floating button appears to snap back.

On mobile

  • The chat panel covers the full viewport โ€” no half-overlay since touch keyboards already eat 40% of the screen.
  • Clicking any deep link the agent sends (e.g. a prefill_form_link URL into /pulse/new) auto-closes the panel so the destination page is visible underneath.
  • The "thinking" indicator is a pulsating gem rendered with pure CSS โ€” theme-aware, respects prefers-reduced-motion, and ships no GIFs.
  • The chat bell + โœจ trigger live in every product header (PulseGuardPlus, CronKeeper, LogVault, CertGuard, CronRunner, HookRelay) โ€” not just the hub. You can ask the agent a question without bouncing back to /dashboard first.
  • Bottom nav (Home / Agent / Chat / Incidents) on screens narrower than md keeps the agent always one tap away.

Security layer

Every chat turn passes through a defense-in-depth filter chain before reaching the LLM:

1. Layer 1 quick-filter (regex) โ€” blocks credential leaks (AWS keys, GitHub tokens, OpenAI keys, private SSH keys), prompt-injection patterns ("ignore previous instructions", DAN jailbreaks, fake [[system]] markers), and Unicode trickery (zero-width / bidi). Redacts internal IPs and hostnames before they reach OpenAI/Anthropic logs.

2. Structured prompts โ€” team memory, recent sessions, and tool results get wrapped in sentinel-delimited UNTRUSTED_DATA blocks with a per-conversation random ID. The system message tells the LLM to treat those blocks as inert text, defending against indirect prompt injection in customer log lines / DB rows / webhook payloads.

3. Tool-result filter โ€” every tool response runs through the same redaction rules before being fed back into the LLM context. The DB still stores the raw payload for operator debugging.

Platform Memory

Separate from skills โ€” a freeform markdown document the agent maintains about your infrastructure. Think of it as an ever-updating runbook.

View and edit at the top of the dashboard. Max 2500 characters. Rendered as markdown (headings, lists, code blocks). The agent rewrites it automatically after any incident that generates new institutional knowledge.

Scan Frequency

Tune at Settings โ†’ Scan Frequency:

  • Watchdog interval โ€” fast sweep (1โ€“30 min, default 5)
  • Deep scan interval โ€” detailed scan (15โ€“360 min, default 60)

Tighter intervals = faster reaction but more LLM cost (you're paying per-scan on your own BYOK key). The default 5/60 pattern is a good starting point; bump up if your infra is quiet, tighten down if incidents are frequent.

BYOK (Bring Your Own LLM Key)

Agent uses your OpenAI, Anthropic, or compatible LLM key โ€” separate from the MicroGemAI chat key.

Configure at Settings โ†’ LLM. Same providers supported as the chat assistant. Typical monthly cost at default intervals runs ~$0.50โ€“2.00 depending on how many investigations the agent opens.

The key is encrypted the same way as integration credentials. Blank the field in settings to keep the existing one (displayed as โ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ข).

Active Toggle

The most-used control. Lives in two places:

  • Dashboard header โ€” green pulsing pill when active, gray when paused.
  • Settings โ†’ Active (top of page).

Click either to start or stop the scanning loop. Pausing preserves all state โ€” credentials, skills, sessions, proposals โ€” and resumes from where it left off when you re-activate.

Cancelling the subscription auto-pauses. Re-subscribing leaves the agent paused so you can review settings first.

REST API

Everything above โ€” tier, LLM key, integrations, credentials โ€” is also exposed over a v1 REST API at microgemlabs.ai/api/v1/agent/*. Use it for:

  • Terraform / IaC โ€” the microgemagent Terraform provider wraps this API. If you want your agent config in version control, that's the path of least resistance.
  • CI pipelines / scripts โ€” rotate a credential from a secret-rotation job, flip tiers in response to a cost check, snapshot config for audit export.
  • Custom dashboards โ€” read-only pulls of tier, active flag, session counters.

Authentication

Every request needs a TeamApiKey Bearer token:

Authorization: Bearer mgl_team_abcdef1234567890

Generate one at Settings โ†’ Account โ†’ API Keys. The key is team-scoped, so no team_id field is sent in requests. Treat it like a root credential โ€” whoever has it can reconfigure the agent and rotate every stored credential.

Endpoints

MethodPathPurpose
GET/api/v1/agent/configRead current config (tier, LLM settings, toggles)
POST/api/v1/agent/configCreate-or-upsert (idempotent)
PATCH/api/v1/agent/configPartial update (supports If-Match)
DELETE/api/v1/agent/configWipe config โ€” destructive, cascades to credentials
GET/api/v1/agent/credentialsList all credentials (metadata only)
POST/api/v1/agent/credentialsCreate a credential
GET/api/v1/agent/credentials/{id}Read one credential (metadata only)
PATCH/api/v1/agent/credentials/{id}Update value and/or label (supports If-Match)
DELETE/api/v1/agent/credentials/{id}Remove a credential

Optimistic concurrency

Every successful write returns the full updated object plus an etag field and ETag response header. On PATCH, send If-Match: "<etag>" โ€” a stale etag returns 412 Precondition Failed so two concurrent Terraform runs can't silently stomp each other. POST and DELETE don't require If-Match (POST is idempotent upsert; DELETE already takes a snapshot).

Plaintext never leaves the server

Credential values and the LLM API key are write-only. Reads return the masked string โ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ข for the LLM key and a sha256 value_fingerprint for integration credentials โ€” never the plaintext. Detect out-of-band drift by computing sha256(your_local_value) and comparing to the returned fingerprint.

Tier ร— integration rules

Setting tier: "starter" combined with aws_enabled, github_enabled, docker_enabled, postgres_enabled, or fix_enabled: true returns 422 Unprocessable Entity with a message identifying the offending flag. Upgrade to Pro, or set the flag to false.

Rate limits

30 requests per minute per API key across all v1 paths. A 429 response carries Retry-After and the standard X-RateLimit-{Limit,Remaining,Reset} headers. The Terraform provider retries with exponential backoff; custom scripts should do the same.

API Examples

Read current config

curl -s https://microgemlabs.ai/api/v1/agent/config \
  -H "Authorization: Bearer $MGL_API_KEY"

Create or update config (idempotent POST)

curl -s https://microgemlabs.ai/api/v1/agent/config \
  -X POST \
  -H "Authorization: Bearer $MGL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tier": "pro",
    "active": true,
    "llm_provider": "anthropic",
    "llm_model": "claude-sonnet-4-6",
    "llm_api_key": "sk-ant-...",
    "ssh_enabled": true,
    "github_enabled": true,
    "fix_enabled": true,
    "fix_cooldown_min": 15,
    "max_fixes_per_day": 10
  }'

Rotate an SSH key

# 1. Find the credential id (metadata only โ€” no plaintext returned)
curl -s https://microgemlabs.ai/api/v1/agent/credentials \
  -H "Authorization: Bearer $MGL_API_KEY" \
  | jq '.credentials[] | select(.cred_type=="ssh_key")'

# 2. PATCH with the new value. If-Match guards against concurrent edits.
ETAG=$(curl -sI https://microgemlabs.ai/api/v1/agent/credentials/$CRED_ID \
  -H "Authorization: Bearer $MGL_API_KEY" | grep -i '^etag:' | awk '{print $2}' | tr -d '\r')

curl -s https://microgemlabs.ai/api/v1/agent/credentials/$CRED_ID \
  -X PATCH \
  -H "Authorization: Bearer $MGL_API_KEY" \
  -H "If-Match: $ETAG" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg v "$(cat new-key.pem)" '{value:$v}')"

Terraform provider

> Status: pre-release. The provider source lives in [terraform/](https://github.com/microgemlabs/mgllabs/tree/main/terraform) inside the platform monorepo. It is not yet published to registry.terraform.io. The HCL below describes the intended shape once it ships; until then, build from source and reference the local binary via [dev_overrides](https://developer.hashicorp.com/terraform/cli/config/config-file#development-overrides-for-provider-developers).

For fleet management, the Terraform provider wraps the v1 REST API with plan-time validation, drift detection, and for_each for credential bundles:

terraform {
  required_providers {
    microgemagent = {
      source  = "microgemlabs/microgemagent"
      version = "~> 0.1"
    }
  }
}

provider "microgemagent" {}  # reads MGL_API_KEY

resource "microgemagent_config" "this" {
  tier         = "pro"
  llm_provider = "anthropic"
  llm_model    = "claude-sonnet-4-6"
  llm_api_key  = var.anthropic_api_key
  ssh_enabled  = true

  lifecycle {
    prevent_destroy = true
  }
}

Once published, the registry URL will be registry.terraform.io/providers/microgemlabs/microgemagent. Watch the [terraform/](https://github.com/microgemlabs/mgllabs/tree/main/terraform) directory's CHANGELOG.md for release status.

Troubleshooting

"Agent not scanning" โ€” Check the dashboard header: if it says *Paused*, click the toggle. If it says *Active โ€” waiting for first scan*, give it up to 5 minutes (default watchdog interval). "Last scan overdue" โ€” The cron ran but the scan errored. Open the most recent session from the sessions list โ€” the detail page will show what failed (usually a credential, integration toggle mismatch, or LLM-quota issue). "Proposal never arrived" โ€” Check tier: Starter doesn't generate fix proposals. Check Fix Mode toggle in settings. Check the 15-minute proposal expiry. "Can't approve proposal" โ€” Approvals require owner or admin role. Members see the proposal but no buttons. "Credentials look fine but agent says unauthorized" โ€” Rotate the credential (e.g. new GitHub PAT) and save again. Stale credentials fail silently until the next scan surfaces the error.