Hosted MCP demo¶
Run RCG as a public HTTP service so anyone can paste their own rules and get a coherence check back — no local install, no filesystem access required. This guide deploys the MCP server to Fly.io over the streamable HTTP transport, optionally backed by a free Neo4j AuraDB instance for graph persistence.
Security: the demo endpoint is public and unauthenticated
By default the hosted server accepts requests from anyone on the internet
and runs no per-request auth. Do not paste secrets, credentials, or
confidential rules into a public instance. The check_rules tool writes
pasted text to a temporary file that is deleted immediately after the check,
but the request still traverses the public network.
Hardening options:
- Set the
RCG_MCP_TOKENsecret and keep the URL private (the current build documents this risk rather than enforcing a token in code — see Authentication below). - Restrict access with Fly private networking and only expose the app inside your organisation.
- Run your own instance instead of sharing one.
Environment variables¶
This is the single source of truth for every environment variable the MCP server reads. All have safe defaults: with none set, the server runs locally over stdio with no guardrails and no graph, exactly as it always has.
| Variable | Default | Purpose |
|---|---|---|
RCG_MCP_TRANSPORT |
stdio |
Transport: stdio (default, local clients), http/streamable-http (the hosted demo), or sse. |
PORT |
8080 |
TCP port the HTTP/SSE transports listen on. Ignored for stdio. |
RCG_MCP_ALLOWED_HOSTS |
(unset) | Comma-separated hosts (host or host:port) added to the DNS-rebinding allow-list. Required for a public hostname — FastMCP's streamable-HTTP transport allows only localhost Host headers by default, so a public host returns HTTP 421 without this. Example: rcg-mcp-demo.fly.dev. Keeps the protection on and derives matching https:///http:// origins. |
RCG_MCP_DISABLE_DNS_REBINDING_PROTECTION |
(unset) | Truthy (1/true/yes) disables DNS-rebinding protection entirely — an escape hatch. Prefer RCG_MCP_ALLOWED_HOSTS: disabling the protection accepts any Host header, which removes a DNS-rebinding defense, so only use it if you front the server with your own auth/proxy. |
RCG_PUBLIC_DEMO |
(unset) | Master switch for the public-demo guardrails. Truthy (1/true/yes) enables every RCG_MAX_*/RCG_GRAPH_MAX_NODES/RCG_RATE_LIMIT_PER_MIN limit below. Unset = fully unrestricted. |
RCG_MAX_INPUT_BYTES |
50000 |
Max UTF-8 byte size of check_rules input text (demo mode only). |
RCG_MAX_RULES |
200 |
Max rules extracted per check_rules call (demo mode only). |
RCG_GRAPH_MAX_NODES |
5000 |
Node count at which the graph auto-clears before an ingest_to_graph write (demo mode only). |
RCG_RATE_LIMIT_PER_MIN |
30 |
Process-wide requests allowed per trailing 60s window (demo mode only). The hosted fly.toml raises this to 120. |
RCG_METRICS_PORT |
9091 |
Port for the Prometheus /metrics endpoint (HTTP transports only; must differ from PORT). Setting it also enables the metrics server. |
NEO4J_URI |
(unset) | Bolt Connection URI neo4j+s://<id>.databases.neo4j.io — not the Query API URL. Optional; enables ingest_to_graph. |
NEO4J_USERNAME |
neo4j |
AuraDB username. On AuraDB this is the instance id (e.g. ac3f157b), not literally neo4j — see the callout below. (NEO4J_USER is also accepted.) |
NEO4J_PASSWORD |
(unset) | AuraDB password (shown only once at instance creation). |
DEEPSEEK_API_KEY / DASHSCOPE_API_KEY / OPENAI_API_KEY / RCG_LLM_* |
(unset) | LLM provider keys/config. The hosted demo defaults to the offline mock extractor; set these only if a tool is called with provider="deepseek"/qwen/openai/anthropic. See LLM providers. |
On the hosted demo, the non-secret variables — RCG_MCP_TRANSPORT, PORT,
RCG_PUBLIC_DEMO, RCG_METRICS_PORT, RCG_RATE_LIMIT_PER_MIN, and
RCG_MCP_ALLOWED_HOSTS — are set as Fly [env] in fly.toml. The three
NEO4J_* values are secrets and are set with fly secrets set (never committed
to fly.toml).
AuraDB username is the instance id, not neo4j
A very common mistake: an AuraDB instance's username is the instance id
shown in console.neo4j.io (e.g. ac3f157b), not
the literal string neo4j. Using neo4j causes a
Neo.ClientError.Security.Unauthorized auth failure. Copy the username from
the instance's credentials exactly.
Cost & guardrails¶
The bundled fly.toml targets a ~$50/month budget while staying cheap in
practice. A shared-cpu-1x VM with 512MB is ~$3.32/mo running 24/7, so keeping
one machine warm (min_machines_running = 1) costs roughly that. The graph runs
on Neo4j AuraDB Free, which is $0 but hard-capped at ~50k nodes.
Two tiers are easy to pick between:
| Tier | Posture | Approx. cost |
|---|---|---|
| Frugal | min_machines_running = 0 (scale-to-zero, accept cold starts), default rate limit. |
~$0–5/mo |
| Demo / $50 (bundled) | Warm min_machines_running = 1, higher concurrency (hard 200 / soft 150), RCG_RATE_LIMIT_PER_MIN = 120. Burst with fly scale count 2 (~+$3.32/mo per VM) or fly scale memory 1024 (1GB, ~$5.92/mo per VM). |
~$3–10/mo steady, up to ~$50 under sustained burst |
The custom metrics below are how you decide when to dial up: watch the
rcg_tool_calls_total rate and Fly's built-in fly_instance_cpu /
fly_instance_exit_oom. Bandwidth is billed at $0.02/GB.
These guardrails are opt-in via RCG_PUBLIC_DEMO=1. When that switch is not
set (local, stdio, or self-hosted use), the tools are fully unrestricted and
behave exactly as before. fly.toml sets RCG_PUBLIC_DEMO=1 for the hosted
demo.
Tunable limits¶
All limits have safe defaults; overriding them is optional. The full list of demo-guardrail variables (and every other variable the server reads) lives in the Environment variables reference below. Each limit is parsed defensively: a missing or invalid value falls back to its default.
Graph auto-clear at cap¶
Before each ingest_to_graph write in demo mode, the server counts existing
nodes and estimates the new ones (rules + their distinct parent file nodes). If
existing + estimated_new would exceed RCG_GRAPH_MAX_NODES, the whole demo
graph is wiped (DETACH DELETE) before the new ingest, and the response
includes "graph_cleared": true. This keeps the free AuraDB graph from filling
its ~50k node cap and getting stuck. The return summary also includes
"n_nodes" (the post-write node count) in demo mode.
Rate limit is process-global¶
There is no reliable per-client identity over the MCP transport, so the rate
limiter is process-wide: it caps the demo's total throughput, not
per-user throughput. The concurrency limits in fly.toml
([http_service.concurrency]) provide a complementary cap on simultaneous
requests.
Error shapes¶
In demo mode the tools never raise across the MCP boundary; instead they return
a structured error dict of the form
{"error": "<machine_code>", "message": "<human text>", "limit": <int>}:
error code |
When |
|---|---|
rate_limited |
The process-wide per-minute request limit was exceeded. |
input_too_large |
check_rules input exceeded RCG_MAX_INPUT_BYTES. |
too_many_rules |
Extracted rule count exceeded RCG_MAX_RULES. |
The file-reading tools (check_corpus, explain_action, score_corpus,
ingest_to_graph) apply only the rate limit in demo mode, since they read
server-side paths rather than caller-supplied text.
Custom metrics¶
The server exposes RCG-specific counters in the Prometheus text exposition
format on a separate port (default 9091, path /metrics), rendered with the
Python standard library only — no prometheus_client dependency. The metrics
endpoint starts automatically for the HTTP transports (it is never started for
stdio, so local/Claude-Code use is untouched). The [metrics] block in
fly.toml tells Fly to scrape :9091/metrics (~every 15s); the series then
appear in the managed Grafana at fly-metrics.net
alongside Fly's built-ins such as fly_edge_http_responses_count,
fly_edge_data_out, fly_instance_cpu, and fly_instance_exit_oom.
| Metric | Type | Labels | Meaning |
|---|---|---|---|
rcg_tool_calls_total |
counter | tool |
One per MCP tool invocation (check_corpus, check_rules, explain_action, score_corpus, ingest_to_graph). |
rcg_guard_rejections_total |
counter | reason |
Demo guardrail rejections (rate_limited, input_too_large, too_many_rules). |
rcg_rules_extracted_total |
counter | — | Total rules extracted across check/score/ingest calls. |
rcg_graph_writes_total |
counter | — | Successful ingest_to_graph writes. |
rcg_graph_clears_total |
counter | — | Times the demo graph auto-cleared at the node cap. |
rcg_graph_write_failures_total |
counter | — | ingest_to_graph attempts that failed. |
All counters are pre-declared, so they render as zero-valued series before the first event (nicer when building Grafana panels). Example MetricsQL/PromQL queries:
# Guard rejection rate, broken out by reason
sum by (reason) (rate(rcg_guard_rejections_total[5m]))
# Tool mix (which tools are actually being called)
sum by (tool) (rate(rcg_tool_calls_total[5m]))
1. Create a Neo4j AuraDB instance (optional)¶
Graph persistence is optional — every tool works without it. To enable it:
- Sign in at console.neo4j.io and create a free AuraDB instance.
- When the instance is created, copy the Connection URI — it uses the
neo4j+s://scheme, e.g.neo4j+s://ac3f157b.databases.neo4j.io. Use this Bolt Connection URI, not the Query API URL shown elsewhere in the console. - Copy the generated username and password. On AuraDB the username is
the instance id (e.g.
ac3f157b), not the literal stringneo4j; the password is shown only once.
RCG passes the URI through unchanged, so the neo4j+s:// (TLS) scheme that
AuraDB requires is honoured.
2. Deploy to Fly.io¶
From the repository root:
fly launch --no-deploy # generates/uses fly.toml; pick a unique app name
fly deploy
The bundled fly.toml serves over streamable HTTP on port 8080, forces HTTPS,
pins a 512MB shared-cpu-1x VM, keeps one machine warm
(min_machines_running = 1) to avoid cold starts, and exposes the custom
metrics on port 9091 (scraped by Fly via the [metrics] block). It also sets
RCG_MCP_ALLOWED_HOSTS to the app's public hostname — without this the
streamable-HTTP transport rejects the public host with HTTP 421 (see
Environment variables). The hosted
image installs only the [mcp] extra; the embeddings
extra (sentence-transformers/torch) is intentionally omitted to keep the image
lean and fit the 512MB VM — check_rules defaults to the mock provider and a
hashing embedder, so no heavy ML dependencies are needed for the public demo.
3. Set secrets¶
All secrets are optional. Set only the ones you need:
fly secrets set \
ANTHROPIC_API_KEY=sk-ant-... \
NEO4J_URI=neo4j+s://ac3f157b.databases.neo4j.io \
NEO4J_USERNAME=ac3f157b \
NEO4J_PASSWORD=your-aura-password
NEO4J_URI must be the Bolt Connection URI (neo4j+s://…), and
NEO4J_USERNAME is the AuraDB instance id, not the literal neo4j — see
Environment variables.
Never commit secrets — fly secrets set stores them encrypted on Fly.
4. Connect an MCP client¶
The hosted endpoint lives at the /mcp path:
https://rcg-mcp-demo.fly.dev/mcp
(replace rcg-mcp-demo with your app name).
Claude Code (or any client that supports the streamable HTTP transport) can be pointed at it with a config entry like:
{
"mcpServers": {
"rcg-hosted": {
"type": "http",
"url": "https://rcg-mcp-demo.fly.dev/mcp"
}
}
}
A generic streamable-HTTP MCP client just needs the same URL.
5. Try the check_rules tool¶
check_rules lets a remote client check a pasted corpus without sharing any
files. Call it with the rules text and a format:
{
"name": "check_rules",
"arguments": {
"rules_text": "# Deploy policy\n- You MUST deploy to production after tests pass.\n- You MUST NOT deploy to production without human approval.\n",
"format": "markdown"
}
}
It returns the same shape as check_corpus:
{
"n_rules": 2,
"score": 0.0,
"by_type": {"contradiction": 1},
"findings": [
{
"type": "contradiction",
"severity": "...",
"reason": "...",
"rule_a": {"text": "...", "file": "CLAUDE.md", "modality": "MUST", "action_class": "deploy.production"},
"rule_b": {"text": "...", "file": "CLAUDE.md", "modality": "MUST_NOT", "action_class": "deploy.production"}
}
]
}
Supported format values: markdown (default), cursorrules, cedar,
rego, yaml. The check runs offline with the deterministic mock extractor.
Smoke-test the deployment¶
scripts/smoke_test.py is a standalone, stdlib-only client (no mcp package,
no requests) that drives a deployed server end to end: it runs the MCP
handshake, lists the tools, and verifies check_rules detects a MUST vs
MUST_NOT conflict. Run it against the live demo (or your own URL):
# Default URL is https://rcg-mcp-demo.fly.dev/mcp
python3 scripts/smoke_test.py
# Your own deployment (positional or --url):
python3 scripts/smoke_test.py https://my-app.fly.dev/mcp
# Also exercise ingest_to_graph (needs NEO4J_* configured server-side):
python3 scripts/smoke_test.py --graph --path examples/gemini_incident
A healthy deployment prints (exit code 0):
RCG MCP smoke test → https://rcg-mcp-demo.fly.dev/mcp
[PASS] initialize — session established, server='rcg'
[PASS] tools/list — tools=check_corpus, explain_action, score_corpus, check_rules, ingest_to_graph
[PASS] check_rules conflict — n_rules=2, findings=1
SUMMARY: ALL CHECKS PASSED
With --graph, a successful write adds
[PASS] ingest_to_graph — written=true, n_rules=4, n_conflicts=2. Any failed
check makes the script exit 1.
Troubleshooting¶
| Symptom | Cause and fix |
|---|---|
HTTP 421 Misdirected Request on initialize |
The streamable-HTTP transport's DNS-rebinding protection rejected the public Host header. Set RCG_MCP_ALLOWED_HOSTS to your hostname (e.g. rcg-mcp-demo.fly.dev) and redeploy. |
Neo.ClientError.Security.Unauthorized from --graph |
Wrong AuraDB credentials. NEO4J_USERNAME must be the instance id (e.g. ac3f157b), not neo4j; recheck the password too. |
ingest_to_graph returns written=true but n_rules=0 |
The examples/ directory is not in the server image, so the path resolved to nothing. The Dockerfile must COPY examples ./examples. |
--graph write fails after the app has been idle |
AuraDB Free auto-pauses after 72h of inactivity. Resume the instance in console.neo4j.io and retry. |
Tools available over HTTP¶
| Tool | Purpose |
|---|---|
check_rules |
Check a pasted corpus (no filesystem access). |
check_corpus |
Check a corpus on the server's filesystem. |
explain_action |
Explain which rules fire for a hypothetical action. |
score_corpus |
Return only the coherence score. |
ingest_to_graph |
Check a corpus and persist it to Neo4j when configured. |
ingest_to_graph returns {"written": false, "reason": "NEO4J_URI not set"}
when no graph is configured, and a written/failed summary otherwise — it never
crashes the server, even if the database is unreachable.
Transport selection¶
The server picks its transport from the environment, defaulting to stdio so local clients keep working unchanged:
RCG_MCP_TRANSPORT |
Transport |
|---|---|
unset / stdio |
stdio (default) |
http / streamable-http |
streamable HTTP on 0.0.0.0:$PORT |
sse |
Server-Sent Events on 0.0.0.0:$PORT |
PORT defaults to 8080. For the HTTP transports behind a public hostname,
also set RCG_MCP_ALLOWED_HOSTS (or, as an escape hatch,
RCG_MCP_DISABLE_DNS_REBINDING_PROTECTION) — see
Environment variables.
Authentication¶
The MCP HTTP transport in the pinned mcp version makes robust per-request
token auth awkward to wire up cleanly, so this build documents the public risk
above rather than enforcing a token in application code. Treat any hosted
instance as public unless you put it behind Fly private networking or a
reverse proxy that enforces auth. The RCG_MCP_TOKEN secret is reserved for a
future enforced-auth build and is safe to set now as a placeholder.