MicroGemLabs

Overview

Autonomous AI DevOps engineer. Watches your infrastructure on a loop, investigates issues with an LLM, proposes fixes for your approval, and learns from every incident it resolves.

Location: microgemlabs.ai/agent

MicroGemAI vs MicroGemAgent

Two different things with similar names:

	MicroGemAI	MicroGemAgent
What it is	Conversational assistant	Autonomous engineer
How you use it	You ask, it answers	Runs on its own, pings you
When it runs	When you open a chat	Every 5 min + on incidents
Asks permission?	N/A	Yes — proposes fixes for approval
Included in platform?	Yes, BYOK LLM key	Premium add-on (three tiers)

MicroGemAI is included with any MicroGemLabs plan. MicroGemAgent is a paid add-on for teams that want autonomous remediation, not just analysis.

Getting Started

1. Click Agent in the top nav, or go directly to microgemlabs.ai/agent

2. Click Enable MicroGemAgent on the setup screen

3. Follow the 4-step checklist that appears on the dashboard:

- Choose a tier

- Add an LLM API key

- Connect at least one integration

- Activate the agent

The checklist dismisses once all four are done, or can be hidden manually.

Tiers

Tier	Price	Integrations	Fix Proposals
Starter	$29/mo	MCP only	Analysis-only — no fixes
Pro	$79/mo	MCP, SSH, AWS, GitHub, Docker, PostgreSQL	Yes — with dashboard approval
Team	$149/mo	All Pro features + multi-environment, shared skills	Yes — priority scan ordering

Change tier from Agent Settings → Plan. Only the team owner can change the tier; admins and members see the selector but their clicks are rejected server-side.

Cancelling

Scroll to the bottom of the Plan section in settings and click the small Cancel subscription link. The agent immediately flips off (tier → None, active → false). When real billing is wired, the click also routes you through Stripe Customer Portal to cancel the paid side.

Cancellation preserves your credentials, skill history, and session logs — you can re-subscribe later without losing them.

Session Types

Every time the agent runs, it creates a session. Three flavors:

Type	Trigger	Default Interval
Watchdog	Fast sweep across all integrations	Every 5 min
Deep Scan	Detailed inspection — metrics, logs, dependencies	Every 60 min
Incident	Cross-product anomaly / alert fires	Event-driven

Click any session row on the dashboard to drill in. The detail page shows the summary, any proposal that came out of it, the LLM conversation timeline, fixes executed, and collapsible raw health + analysis JSON.

Session Outcomes

Healthy — Nothing to act on.
Warning — Anomaly detected but below fix threshold.
Critical — Fix proposal raised, approval required.
Error — The agent itself failed (credential rotated, LLM down, etc.).

Integrations

MCP is always on (it's how the agent reads from MicroGemLabs itself). The rest require credentials and, except SSH, require Pro tier or higher.

Integration	What it does	Tier gate
MCP	Reads monitor / log / cert / cron / webhook state	Starter+
SSH	PM2 ops, system stats, Docker, PG via SSH	Starter+
AWS	CloudWatch metrics, EC2 reboot, ECS deploy, Lambda config	Pro+
GitHub	Read deploys, rerun workflows, roll back commits	Pro+
Docker	Container stats, logs, restart, compose	Pro+
PostgreSQL	Connection + lock + query stats, terminate sessions	Pro+

Enable / disable at Settings → Integrations. A disabled integration never runs, even if credentials are present. Enabling an integration without credentials is allowed but the agent will skip it at scan time.

Connections (Credentials)

At Settings → Connections. Each integration is a single collapsible card with all its fields grouped together. Fill in what the agent needs, click Save connection, and everything is encrypted together.

SSH

Host(s) — comma-separated host:port pairs
Username — same user across all hosts
Private key — pasted SSH private key (OpenSSH or RSA format)

AWS

Access Key ID
Secret Access Key
Region (e.g. us-east-1)

GitHub

Personal Access Token (scopes: repo, actions:write)
Owner, Repo, Deploy Workflow filename

PostgreSQL

Connection string — postgresql://user:pass@host:5432/db
Database name

Cloudflare

API token — scoped to the zone you want to manage. Required scope: Zone → Cache Purge → Edit. Create at <https://dash.cloudflare.com/profile/api-tokens>.
Zone ID — the zone the agent purges. Found in the Cloudflare dashboard's right sidebar on the zone overview page.

Used by the cloudflare_cache_purge fix action.

Kubernetes (no separate credential)

The agent runs kubectl rollout restart over your existing SSH connection — there's no Kubernetes-specific credential. Setup requirements on the SSH host:

kubectl installed and on PATH
KUBECONFIG set (or ~/.kube/config present) for the SSH user, pointing at the cluster you want the agent to manage
The kubeconfig context's user must have RBAC permission for rollouts/restart on the target namespaces

To test from the SSH host: kubectl rollout restart deployment/<name> -n <namespace> should succeed without prompting.

Redis (no separate credential)

The agent runs redis-cli over SSH. Setup on the SSH host:

redis-cli installed and on PATH
The host can reach your Redis instance over the network
For password-protected Redis, the agent passes the password via REDISCLI_AUTH environment variable (so it never appears in ps aux); pass password in pinnedData when proposing the fix

If your Redis runs in a Docker container on the SSH host, the simplest pattern is docker exec <redis-container> redis-cli ... — wrap that in a custom skill rather than using the built-in redis_del_key action.

systemd (no separate credential)

The systemctl_restart action runs sudo -n systemctl restart <unit> over SSH. The SSH user must have passwordless sudo for systemctl. Add to /etc/sudoers.d/microgemagent:

agent-user ALL=(root) NOPASSWD: /bin/systemctl restart *

Replace agent-user with whatever username matches your SSH credential. The action's error message will surface "sudo password required" or "user not allowed" if NOPASSWD isn't configured correctly.

Security

Credentials are encrypted with AES-256-GCM using a server-held AGENT_ENCRYPTION_KEY. They're only decrypted into memory during the agent's own scan cycles — never rendered to any UI, never logged, never sent to your LLM provider.

The decrypt function is deliberately scoped to background code only. Even a stolen user session cannot dump credentials via the app.

Editing: leave a field blank to keep its current value; only non-empty fields overwrite. Disconnecting: the red Disconnect button wipes every credential for that integration. The associated toggle in Integrations auto-flips off.

Fix Proposals

When the agent detects an issue it can plausibly fix, it raises a proposal: a pending notification describing the action, target, and reasoning. Your team approves or denies from the dashboard.

How Approval Works

1. Agent session ends with a proposal and creates an AgentNotification with status pending.

2. Optional doorbell ping fires to any connected Telegram / Slack channel. The ping contains *only* the title and a dashboard link — no IPs, no PIDs, no query data.

3. The notification appears in:

- The bell icon in the hub header (polled every 30s)

- A full-width amber banner at the top of every hub page, for critical severity only

- Click through to the dashboard to see full details

4. An owner or admin clicks Approve or Deny. Members cannot (server rejects with 403).

5. On approve, the agent executes the fix and records it as an AgentFix linked to the session.

6. Proposals expire after 15 minutes if untouched — agent moves on, no partial execution.

Fix mode is Pro-tier only. Starter tier sees proposals in the session detail but no approval UI.

Safety Rails

Cooldown — minimum time between consecutive fixes (5–120 min, default 30).
Daily cap — maximum fixes per 24 hours (1–50, default 10).
Credential scope — fixes only touch resources the agent has credentials for.
Audit — every proposal, approval, denial, and execution is logged in AgentFix with timestamp and actor.

Tune at Settings → Fix Mode.

Fix Actions Reference

Every action goes through the same proposal → approve → execute flow. The target shape and any pinnedData fields are listed per action below. All actions in this table are classified destructive — first execution always requires explicit user approval, regardless of skill trustLevel.

Action	Tier	Target shape	Notes
`docker_restart`	Pro+	`host:container` or bare `container`	SSH-based
`docker_pull_restart`	Pro+	`host:container` or bare `container`	Pulls image first
`docker_compose_up`	Pro+	`/opt/path/to/project`	SSH-based, path must be under `/opt/...`
`kubectl_rollout_restart`	Pro+	`namespace/name` or `namespace/<deployment\`	daemonset\	statefulset>/<name>	SSH host needs `kubectl` + KUBECONFIG
`kubectl_scale_deployment`	Pro+	`namespace/deployment-name`	`pinnedData.replicas` (0–1000) required. Result includes before/after replica count
`pg_cancel_query`	Pro+	PostgreSQL backend PID	Requires PG connection. Pinned at proposal time
`pg_terminate_query`	Pro+	PostgreSQL backend PID	Requires PG connection. Pinned at proposal time
`redis_del_key`	Pro+	Exact Redis key (no globs)	SSH host needs `redis-cli`
`redis_flushdb`	Pro+	DB number 0–15	Flushes whole logical DB; FLUSHALL not exposed
`systemctl_restart`	Pro+	systemd unit name	SSH user needs `NOPASSWD` for `systemctl`
`cloudflare_cache_purge`	Pro+	Comma-separated URL list (1–30) OR `everything`	Needs Cloudflare credential
`pm2_restart`	Pro+	`host:process`	Restart a single PM2 process
`pm2_restart_all`	Pro+	hostname	Restart every PM2 process on a host
`github_rerun_workflow`	Pro+	numeric workflow run id	Re-runs an existing failed CI run
`github_workflow_dispatch`	Pro+	workflow filename (`deploy.yml`) or numeric workflow id	Triggers a NEW run with caller-supplied inputs. Optional `pinnedData.ref` (defaults to default branch), `pinnedData.inputs` (workflow input map)
`github_rollback`	Team	literal `previous`	SHA snapshotted at proposal time
`ec2_reboot`	Team	EC2 instance id (`i-...`)	Allowlist enforced via `aws_resources.ec2Instances`
`ecs_force_deploy`	Team	`cluster/service` or bare service	Allowlist enforced
`aws_lambda_update`	Team	`function-name:alias`	`pinnedData.targetVersion` required. Allowlist via `aws_resources.lambdaFunctions`
`aws_rds_reboot`	Team	RDS instance identifier	Optional `pinnedData.forceFailover` triggers Multi-AZ swap. Allowlist via `aws_resources.rdsInstances`
`health_check_url`	Pro+	URL	Read-only HTTP probe (non-destructive but listed for completeness)

Tier note: Starter tier sees proposals in session detail but cannot approve / execute fixes. Team tier unlocks GitHub rollback, EC2 reboot, ECS force-deploy, AWS Lambda alias re-pointing, and AWS RDS reboot — actions whose blast radius warrants a higher tier.

Skills Library

A unified library at microgemlabs.ai/agent/skills that holds three kinds of reusable agent capability:

Type	What it is	Trust path
Knowledge	Markdown playbooks the agent loads as context when a trigger keyword matches ("connection pool exhausted → run these queries first")	Validated by default — knowledge can't execute
Executable	Sandboxed JavaScript that calls helper functions (`helpers.postgres.query`, `helpers.aws.ec2.reboot`, etc.)	Starts in quarantine → auto-promotes to validated after 5 successful runs
Runbook	Trigger-pattern + action binding for incident response (e.g., regex-match an alert title → call `pm2_restart` on a target host)	Same quarantine → validated path; circuit breaker trips after consecutive failures

Skills accumulate three ways:

Generated by the agent after a successful fix
Authored manually in the UI (knowledge / runbook forms; executable scripts get a Monaco-editor surface with full JS syntax highlighting)
Imported from another team's JSON export

Library UI

List view (/agent/skills) — filter by type, category, trust level; full-text search across name + content + script + pattern + actionType + category (Postgres tsvector under the hood, so "queries" matches "query")
Detail view — stats (total runs, success rate, avg duration), execution history, Edit / Export buttons, and a Run button that streams live progress (SSE)
Run button — for executable + runbook skills. Streams a tile of helper calls and console.log output as the script runs; final card shows success/failure + duration + helper-call count. Destructive skills require approval mid-run (proposal raised, dashboard click to continue).
Import / Export — bulk and per-skill JSON. The export envelope is bundle-tagged + schema-versioned; importing on a different team replays through the validator and skips name/type collisions.

Trust + approval

Three trust levels gate execution:

quarantine — every run requires approval
validated — auto-runs unless the team's approval mode is strict
trusted — auto-runs even in destructive cases (only set for skills you genuinely vetted)

Tune the team-wide policy at Settings → Skill Execution: choose between strict (everything needs approval), standard (validated auto-runs, destructive needs approval), or trusted (only quarantined skills need approval).

Sharing community skills (shared: true) is opt-in per skill and visible to other teams — off by default.

Recursive Self-Improvement

Every incident the agent resolves makes the next one easier. The loop is closed end-to-end — what the agent learns at 3 AM today shows up in the LLM prompt for tomorrow's similar incident.

The closed loop

1. Postmortem-driven skill drafts — when an incident resolves and a postmortem is generated, the agent auto-drafts a knowledge skill at quarantine tier capturing the root-cause + resolution. You review at /agent/skills?trust=quarantine and promote what's worth keeping.

2. Vector embeddings on every memory + skill — text-embedding-3-small (1536-dim) embeddings are computed at write time and stored in pgvector. No separate embedding key needed; the agent uses the same OpenAI-compatible endpoint as your chat key, falling back to a hash-based pseudo-vector if the provider doesn't support embeddings.

3. Semantic retrieval at investigation time — when an incident fires, the agent computes a query embedding from the alert title + symptoms and pulls the top-K most similar memories and skills (ivfflat index, cosine similarity). These are injected into the LLM prompt as RELEVANT_PAST_INCIDENTS and RELEVANT_PLAYBOOKS blocks.

4. Citation tracking — every memory and skill the LLM sees creates a MemoryCitation row stamped with the session ID. When the incident resolves, outcomes are written back: helpful / unhelpful / not_used based on whether the LLM referenced the citation in its analysis and whether the proposal was approved.

5. Confidence adjustment — citation outcomes feed back into the memory's confidence score (0.0–1.0). Frequently-cited memories rise; never-cited memories decay.

6. Weekly consolidation cron — every Monday a job clusters similar incident_learning memories into "recurring pattern" memos (e.g., "Redis OOM has caused 4 outages in the last 60 days — common trigger is unbounded list growth"). Pattern memos become first-class memories the agent can cite.

7. Staleness demotion — skills retrieved by semantic search but never executed (i.e. the LLM read them but the proposal flow didn't fire them) auto-demote out of validated tier after a configurable inactivity window. Stops dead skills from polluting future retrievals.

Observability — `/agent/memory`

The full loop is visible at microgemlabs.ai/agent/memory (top-level nav under Agent):

Pool stats — count of memories + skills by tier, embedding coverage %, average confidence
Top citations — which memories and skills got cited most in the last 30 days, with helpful / unhelpful win-rate
Recent consolidations — pattern memos generated by the weekly cron, with the source incident-learning memories collapsed into each
Stale candidates — skills flagged for demotion (retrieved but not executed in N days). One-click "keep" defers demotion another month
Missed recommendations — incidents where the agent's retrieved memories suggested a known fix but the proposal wasn't approved (or was denied). Useful for tuning approval policy or spotting brittle skills

Citations are also visible inline on every session detail page — each retrieved memory is stamped with how it was used.

Internet-Aware Investigation

Beyond your own data, the agent ships three external research tools the LLM can call mid-investigation:

Tool	What it does	Notes
`web_search`	General web search via Tavily	Useful for unknown error messages, recent CVEs, vendor incident threads. Configure `TAVILY_API_KEY` in env.
`fetch_url`	Read a specific page (docs, changelog, status)	SSRF-guarded — RFC1918 / `.internal` / `.local` blocked. Only the first ~100KB is returned to keep the LLM context bounded.
`check_third_party_status`	Fast in/out check against 14 provider status pages	GitHub, AWS, Stripe, OpenAI, Anthropic, Cloudflare, Vercel, Supabase, Datadog, Sentry, Slack, Discord, PagerDuty, Atlassian. Read-only and cached for 60s — calling once across multiple sessions is cheap.

These tools are also available in MicroGemAI gemai mode (read-only chat) — see the MicroGemAI guide.

Agent Chat

A slide-out chat panel for ad-hoc questions and tool-driven investigation. Click the ✨ icon in the hub header (or ⌘K if it's wired in your shortcuts).

Modes

The same panel powers two flavors, gated by tier:

Mode	Enabled when	Tools
Gemai mode	Any team with an LLM API key	Read-only investigation surface — `search_docs`, `search_memory`, `expand_memory`, `search_skills`, `gather_health`, `query_mcp`, `recall_recent_alerts`, `list_postmortems`, `get_postmortem`, `find_similar_past_incidents`, `search_past_sessions`, `web_search`, `fetch_url`, `check_third_party_status`
Agent mode	Paid agent tier (Starter+)	Everything in gemai mode plus `propose_fix`, `execute_skill`, `notify_team`, `prefill_form_link`, `update_skill`, `create_knowledge_skill`, `create_runbook`

A team that upgrades mid-thread keeps its conversation history visible — old messages stamped with the kind they were sent under.

Tool reference (agent-mode-only)

Tool	What it does
`propose_fix`	Raise a fix proposal for any of the 21 fix actions. Includes `target` + optional `pinnedData`. Hits the same approval flow as agent-driven proposals.
`execute_skill`	Run an executable or runbook skill by name. Streams progress back into the chat; mid-run approval gates pause and wait for a dashboard click for destructive skills.
`notify_team`	Send a one-shot notification through the team's on-call channels (email / SMS / Slack / Telegram / voice) without raising an incident. Good for "FYI, deploy in 5 min" or "I rolled back X — ack if any issues."
`prefill_form_link`	Generate a deep link into the relevant create form (`/pulse/new`, `/cron/checks/new`, `/hooks/new`, etc.) with the user's stated values pre-populated as query params. Always preferred over inline JSON for any create flow — the user reviews + clicks Save.
`update_skill`	Edit an existing skill — content, trust level, trigger pattern. Useful for "promote that runbook out of quarantine" or "tighten the regex on the payment-cron runbook."
`create_knowledge_skill` / `create_runbook`	Author new skills directly from chat. Defaults to `quarantine` tier so a human reviews before it auto-fires.

Panel features

Markdown rendering in assistant responses — fenced code blocks render as styled tiles with a language label and a copy button; inline code gets a colored pill; internal links (/skills/...) use the Next router so the chat panel stays open across navigation, external links open in a new tab
Voice input — mic button (Pro+ projects with the gpt-4o-mini-transcribe model enabled). Press once to start, again to stop, OR walk away — voice activity detection auto-stops after ~1.5s of silence and submits.
↑ / ↓ history — terminal-style recall of your prior messages. Persisted in localStorage so it spans conversations on this device. Up to 100 entries, consecutive duplicates skipped.
Expand toggle — header button to take the panel full-screen; choice persists across reloads.
Drag to resize — left edge of the panel (desktop only). Width capped at 90% of viewport, persisted to localStorage.
Smart auto-scroll — only follows new tokens when you're already at the bottom; if you scrolled up to read history a "↓ Latest" floating button appears to snap back.

On mobile

The chat panel covers the full viewport — no half-overlay since touch keyboards already eat 40% of the screen.
Clicking any deep link the agent sends (e.g. a prefill_form_link URL into /pulse/new) auto-closes the panel so the destination page is visible underneath.
The "thinking" indicator is a pulsating gem rendered with pure CSS — theme-aware, respects prefers-reduced-motion, and ships no GIFs.
The chat bell + ✨ trigger live in every product header (PulseGuardPlus, CronKeeper, LogVault, CertGuard, CronRunner, HookRelay) — not just the hub. You can ask the agent a question without bouncing back to /dashboard first.
Bottom nav (Home / Agent / Chat / Incidents) on screens narrower than md keeps the agent always one tap away.

Security layer

Every chat turn passes through a defense-in-depth filter chain before reaching the LLM:

1. Layer 1 quick-filter (regex) — blocks credential leaks (AWS keys, GitHub tokens, OpenAI keys, private SSH keys), prompt-injection patterns ("ignore previous instructions", DAN jailbreaks, fake [[system]] markers), and Unicode trickery (zero-width / bidi). Redacts internal IPs and hostnames before they reach OpenAI/Anthropic logs.

2. Structured prompts — team memory, recent sessions, and tool results get wrapped in sentinel-delimited UNTRUSTED_DATA blocks with a per-conversation random ID. The system message tells the LLM to treat those blocks as inert text, defending against indirect prompt injection in customer log lines / DB rows / webhook payloads.

3. Tool-result filter — every tool response runs through the same redaction rules before being fed back into the LLM context. The DB still stores the raw payload for operator debugging.

Platform Memory

Separate from skills — a freeform markdown document the agent maintains about your infrastructure. Think of it as an ever-updating runbook.

View and edit at the top of the dashboard. Max 2500 characters. Rendered as markdown (headings, lists, code blocks). The agent rewrites it automatically after any incident that generates new institutional knowledge.

Scan Frequency

Tune at Settings → Scan Frequency:

Watchdog interval — fast sweep (1–30 min, default 5)
Deep scan interval — detailed scan (15–360 min, default 60)

Tighter intervals = faster reaction but more LLM cost (you're paying per-scan on your own BYOK key). The default 5/60 pattern is a good starting point; bump up if your infra is quiet, tighten down if incidents are frequent.

BYOK (Bring Your Own LLM Key)

Agent uses your OpenAI, Anthropic, or compatible LLM key — separate from the MicroGemAI chat key.

Configure at Settings → LLM. Same providers supported as the chat assistant. Typical monthly cost at default intervals runs ~$0.50–2.00 depending on how many investigations the agent opens.

The key is encrypted the same way as integration credentials. Blank the field in settings to keep the existing one (displayed as ••••••••).

Active Toggle

The most-used control. Lives in two places:

Dashboard header — green pulsing pill when active, gray when paused.
Settings → Active (top of page).

Click either to start or stop the scanning loop. Pausing preserves all state — credentials, skills, sessions, proposals — and resumes from where it left off when you re-activate.

Cancelling the subscription auto-pauses. Re-subscribing leaves the agent paused so you can review settings first.

REST API

Everything above — tier, LLM key, integrations, credentials — is also exposed over a v1 REST API at microgemlabs.ai/api/v1/agent/*. Use it for:

Terraform / IaC — the microgemagent Terraform provider wraps this API. If you want your agent config in version control, that's the path of least resistance.
CI pipelines / scripts — rotate a credential from a secret-rotation job, flip tiers in response to a cost check, snapshot config for audit export.
Custom dashboards — read-only pulls of tier, active flag, session counters.

Authentication

Every request needs a TeamApiKey Bearer token:

Authorization: Bearer mgl_team_abcdef1234567890

Generate one at Settings → Account → API Keys. The key is team-scoped, so no team_id field is sent in requests. Treat it like a root credential — whoever has it can reconfigure the agent and rotate every stored credential.

Endpoints

Method	Path	Purpose
GET	`/api/v1/agent/config`	Read current config (tier, LLM settings, toggles)
POST	`/api/v1/agent/config`	Create-or-upsert (idempotent)
PATCH	`/api/v1/agent/config`	Partial update (supports If-Match)
DELETE	`/api/v1/agent/config`	Wipe config — destructive, cascades to credentials
GET	`/api/v1/agent/credentials`	List all credentials (metadata only)
POST	`/api/v1/agent/credentials`	Create a credential
GET	`/api/v1/agent/credentials/{id}`	Read one credential (metadata only)
PATCH	`/api/v1/agent/credentials/{id}`	Update value and/or label (supports If-Match)
DELETE	`/api/v1/agent/credentials/{id}`	Remove a credential

Optimistic concurrency

Every successful write returns the full updated object plus an etag field and ETag response header. On PATCH, send If-Match: "<etag>" — a stale etag returns 412 Precondition Failed so two concurrent Terraform runs can't silently stomp each other. POST and DELETE don't require If-Match (POST is idempotent upsert; DELETE already takes a snapshot).

Plaintext never leaves the server

Credential values and the LLM API key are write-only. Reads return the masked string •••••••• for the LLM key and a sha256 value_fingerprint for integration credentials — never the plaintext. Detect out-of-band drift by computing sha256(your_local_value) and comparing to the returned fingerprint.

Tier × integration rules

Setting tier: "starter" combined with aws_enabled, github_enabled, docker_enabled, postgres_enabled, or fix_enabled: true returns 422 Unprocessable Entity with a message identifying the offending flag. Upgrade to Pro, or set the flag to false.

Rate limits

30 requests per minute per API key across all v1 paths. A 429 response carries Retry-After and the standard X-RateLimit-{Limit,Remaining,Reset} headers. The Terraform provider retries with exponential backoff; custom scripts should do the same.

API Examples

Read current config

curl -s https://microgemlabs.ai/api/v1/agent/config \
  -H "Authorization: Bearer $MGL_API_KEY"

Create or update config (idempotent POST)

curl -s https://microgemlabs.ai/api/v1/agent/config \
  -X POST \
  -H "Authorization: Bearer $MGL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tier": "pro",
    "active": true,
    "llm_provider": "anthropic",
    "llm_model": "claude-sonnet-4-6",
    "llm_api_key": "sk-ant-...",
    "ssh_enabled": true,
    "github_enabled": true,
    "fix_enabled": true,
    "fix_cooldown_min": 15,
    "max_fixes_per_day": 10
  }'

Rotate an SSH key

# 1. Find the credential id (metadata only — no plaintext returned)
curl -s https://microgemlabs.ai/api/v1/agent/credentials \
  -H "Authorization: Bearer $MGL_API_KEY" \
  | jq '.credentials[] | select(.cred_type=="ssh_key")'

# 2. PATCH with the new value. If-Match guards against concurrent edits.
ETAG=$(curl -sI https://microgemlabs.ai/api/v1/agent/credentials/$CRED_ID \
  -H "Authorization: Bearer $MGL_API_KEY" | grep -i '^etag:' | awk '{print $2}' | tr -d '\r')

curl -s https://microgemlabs.ai/api/v1/agent/credentials/$CRED_ID \
  -X PATCH \
  -H "Authorization: Bearer $MGL_API_KEY" \
  -H "If-Match: $ETAG" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg v "$(cat new-key.pem)" '{value:$v}')"

Terraform provider

> Status: pre-release. The provider source lives in [terraform/](https://github.com/microgemlabs/mgllabs/tree/main/terraform) inside the platform monorepo. It is not yet published to registry.terraform.io. The HCL below describes the intended shape once it ships; until then, build from source and reference the local binary via [dev_overrides](https://developer.hashicorp.com/terraform/cli/config/config-file#development-overrides-for-provider-developers).

For fleet management, the Terraform provider wraps the v1 REST API with plan-time validation, drift detection, and for_each for credential bundles:

terraform {
  required_providers {
    microgemagent = {
      source  = "microgemlabs/microgemagent"
      version = "~> 0.1"
    }
  }
}

provider "microgemagent" {}  # reads MGL_API_KEY

resource "microgemagent_config" "this" {
  tier         = "pro"
  llm_provider = "anthropic"
  llm_model    = "claude-sonnet-4-6"
  llm_api_key  = var.anthropic_api_key
  ssh_enabled  = true

  lifecycle {
    prevent_destroy = true
  }
}

Once published, the registry URL will be registry.terraform.io/providers/microgemlabs/microgemagent. Watch the [terraform/](https://github.com/microgemlabs/mgllabs/tree/main/terraform) directory's CHANGELOG.md for release status.

Troubleshooting

"Agent not scanning" — Check the dashboard header: if it says *Paused*, click the toggle. If it says *Active — waiting for first scan*, give it up to 5 minutes (default watchdog interval). "Last scan overdue" — The cron ran but the scan errored. Open the most recent session from the sessions list — the detail page will show what failed (usually a credential, integration toggle mismatch, or LLM-quota issue). "Proposal never arrived" — Check tier: Starter doesn't generate fix proposals. Check Fix Mode toggle in settings. Check the 15-minute proposal expiry. "Can't approve proposal" — Approvals require owner or admin role. Members see the proposal but no buttons. "Credentials look fine but agent says unauthorized" — Rotate the credential (e.g. new GitHub PAT) and save again. Stale credentials fail silently until the next scan surfaces the error.

MicroGemAgent User Guide