LLM providers¶
RCG uses an LLM in two places — the extraction step (raw rule text → canonical
Rule) and the optional semantic judge (does this pair of rules conflict?).
Both sit behind small protocols, so the model backend is pluggable.
RCG's LLM layer covers (a) cloud gateways — Amazon Bedrock, Azure AI Foundry, Google Vertex AI; (b) direct vendors — Anthropic, OpenAI; (c) aggregators — OpenRouter; plus the Gemini API (Google AI Studio), DeepSeek, Qwen, and any OpenAI-compatible endpoint (local vLLM/Ollama). All but Anthropic ride one OpenAI-compatible provider class.
Beyond Anthropic, RCG ships one OpenAI-compatible provider class
(src/rcg/extractors/openai_provider.py) that drives any endpoint speaking the
OpenAI Chat Completions API with function/tool calling: DeepSeek, Qwen (DashScope
or a local server), OpenAI itself, OpenRouter, the Gemini API, Amazon Bedrock,
Azure AI Foundry, Google Vertex AI, and local servers such as vLLM or Ollama. The
endpoint is selected purely by base URL, model, and API key.
Install¶
The OpenAI-compatible providers live in an optional extra:
pip install 'rule-coherence-graph[openai]'
Anthropic and the offline mock provider need no extra.
Provider matrix¶
--provider |
Backend | Default base URL | Default model | Key env (with fallback) |
|---|---|---|---|---|
anthropic |
Anthropic Messages API | (SDK default) | claude-sonnet-4-6 |
ANTHROPIC_API_KEY |
deepseek |
DeepSeek (OpenAI-compatible) | https://api.deepseek.com |
deepseek-chat |
DEEPSEEK_API_KEY → RCG_LLM_API_KEY |
qwen |
Qwen via DashScope | https://dashscope.aliyun.com/compatible-mode/v1 |
qwen-max |
DASHSCOPE_API_KEY → RCG_LLM_API_KEY |
openai |
OpenAI / any compatible endpoint | SDK default or RCG_LLM_BASE_URL |
gpt-4o-mini |
OPENAI_API_KEY → RCG_LLM_API_KEY |
openrouter |
OpenRouter aggregator | https://openrouter.ai/api/v1 |
anthropic/claude-sonnet-4 (overridable) |
OPENROUTER_API_KEY → RCG_LLM_API_KEY |
google |
Google Gemini API (AI Studio) | https://generativelanguage.googleapis.com/v1beta/openai/ |
gemini-2.5-flash |
GEMINI_API_KEY → GOOGLE_API_KEY → RCG_LLM_API_KEY |
bedrock |
Amazon Bedrock (OpenAI-compatible) | https://bedrock-runtime.<region>.amazonaws.com/openai/v1 (region from RCG_LLM_REGION/AWS_REGION/AWS_DEFAULT_REGION, default us-east-1) |
openai.gpt-oss-120b-1:0 |
AWS_BEARER_TOKEN_BEDROCK → RCG_LLM_API_KEY |
azure |
Azure AI Foundry / Azure OpenAI | <AZURE_OPENAI_ENDPOINT>/openai/v1 (v1 GA path; no api-version) |
model = deployment name (required, from RCG_LLM_MODEL; no default) |
AZURE_OPENAI_API_KEY → RCG_LLM_API_KEY |
vertex |
Google Vertex AI (OpenAI-compatible) | https://<region>-aiplatform.googleapis.com/v1/projects/<project>/locations/<region>/endpoints/openapi (region from RCG_LLM_REGION/VERTEX_LOCATION, default us-central1; project from VERTEX_PROJECT/GOOGLE_CLOUD_PROJECT) |
required (from RCG_LLM_MODEL, e.g. google/gemini-2.5-flash; no default) |
GOOGLE_VERTEX_ACCESS_TOKEN → RCG_LLM_API_KEY (short-lived OAuth token) |
mock |
Deterministic heuristics (offline) | — | — | none |
auto |
anthropic if ANTHROPIC_API_KEY is set, else mock |
— | — | — |
The model for any OpenAI-compatible preset can be overridden with RCG_LLM_MODEL.
Environment variables¶
| Variable | Purpose |
|---|---|
RCG_LLM_BASE_URL |
Override the base URL for the generic openai provider (point it at a local vLLM/Ollama server). For bedrock, also overrides the region-derived URL (use it to pick the Mantle endpoint or another region/host). |
RCG_LLM_MODEL |
Override the model id for any OpenAI-compatible provider/preset. |
RCG_LLM_API_KEY |
Generic API-key fallback for every OpenAI-compatible preset (including bedrock). |
RCG_LLM_REGION |
Region for the bedrock provider's base URL. Falls back to AWS_REGION, then AWS_DEFAULT_REGION, then us-east-1. |
OPENAI_API_KEY |
Standard key the OpenAI SDK reads; used by the generic openai provider. |
DEEPSEEK_API_KEY |
Key for the deepseek preset. |
DASHSCOPE_API_KEY |
Key for the qwen preset. |
AWS_BEARER_TOKEN_BEDROCK |
Amazon Bedrock API key used as the bearer token for the bedrock provider (not SigV4). |
OPENROUTER_API_KEY |
Key for the openrouter aggregator preset. |
GEMINI_API_KEY / GOOGLE_API_KEY |
Key for the google (Gemini API) preset; GEMINI_API_KEY is tried first, then GOOGLE_API_KEY. |
AZURE_OPENAI_ENDPOINT |
Per-resource Azure endpoint (e.g. https://myres.openai.azure.com); RCG appends /openai/v1. Required for azure. |
AZURE_OPENAI_API_KEY |
Azure OpenAI API key for the azure provider. |
VERTEX_PROJECT / GOOGLE_CLOUD_PROJECT |
GCP project for the vertex provider's base URL (VERTEX_PROJECT first). Required for vertex. |
VERTEX_LOCATION |
Region fallback for vertex after RCG_LLM_REGION; default us-central1. |
GOOGLE_VERTEX_ACCESS_TOKEN |
Short-lived Google OAuth access token (gcloud auth print-access-token) used as the bearer for the vertex provider. |
Key resolution order for each preset: explicit constructor argument → the
preset-specific env var → generic RCG_LLM_API_KEY → (for the generic openai
provider only) OPENAI_API_KEY. If no key is found, the CLI prints which env var
to set and reminds you that --provider mock works offline.
Examples¶
# DeepSeek
export DEEPSEEK_API_KEY=sk-...
rcg check ./rules --provider deepseek
# Qwen (DashScope)
export DASHSCOPE_API_KEY=sk-...
rcg check ./rules --provider qwen
# OpenAI
export OPENAI_API_KEY=sk-...
rcg check ./rules --provider openai
# A local OpenAI-compatible server (Ollama, vLLM, ...)
export RCG_LLM_BASE_URL=http://localhost:11434/v1
export RCG_LLM_API_KEY=ollama # most local servers accept any token
export RCG_LLM_MODEL=qwen2.5:7b
rcg check ./rules --provider openai
OpenRouter¶
OpenRouter aggregates many vendors behind one
OpenAI-compatible API. Models use a vendor/model id:
export OPENROUTER_API_KEY=sk-or-...
rcg check ./rules --provider openrouter # default model anthropic/claude-sonnet-4
# Pick any OpenRouter model with RCG_LLM_MODEL
export RCG_LLM_MODEL=openai/gpt-4o-mini
rcg check ./rules --provider openrouter
Google Gemini API (AI Studio)¶
The Gemini API exposes an OpenAI-compatible surface. The key is read from
GEMINI_API_KEY, then GOOGLE_API_KEY:
export GEMINI_API_KEY=...
rcg check ./rules --provider google # default model gemini-2.5-flash
Amazon Bedrock¶
Amazon Bedrock exposes an OpenAI-compatible Chat Completions endpoint, so RCG
drives it through the same provider class. The base URL is derived from a region
(RCG_LLM_REGION → AWS_REGION → AWS_DEFAULT_REGION → us-east-1):
export AWS_BEARER_TOKEN_BEDROCK=... # your Bedrock API key
export RCG_LLM_REGION=us-west-2 # region where the model is enabled
export RCG_LLM_MODEL=openai.gpt-oss-120b-1:0
rcg check ./rules --provider bedrock
To use the newer "Mantle" endpoint (or any other host/region), set
RCG_LLM_BASE_URL explicitly — it always wins over the region-derived URL:
export AWS_BEARER_TOKEN_BEDROCK=...
export RCG_LLM_BASE_URL=https://bedrock-mantle.us-west-2.api.aws/v1
rcg check ./rules --provider bedrock
Caveats:
- Model availability is region-gated. The OpenAI-style
gpt-ossmodels (openai.gpt-oss-120b-1:0,openai.gpt-oss-20b-1:0) launched inus-west-2. Pick a region and model id your account has enabled. - Auth via the OpenAI-SDK path uses an Amazon Bedrock API key as the bearer
token, not SigV4. Set it in
AWS_BEARER_TOKEN_BEDROCK(or the genericRCG_LLM_API_KEY).
Azure AI Foundry / Azure OpenAI¶
Azure uses the OpenAI SDK against a per-resource endpoint, and the "model" is the
deployment name you created in your Azure resource. RCG targets the modern v1
GA path (<endpoint>/openai/v1), so no api-version query parameter is needed:
export AZURE_OPENAI_ENDPOINT=https://<res>.openai.azure.com # trailing slash optional
export AZURE_OPENAI_API_KEY=...
export RCG_LLM_MODEL=<deployment-name> # NOT a model id — your deployment name
rcg check ./rules --provider azure
Caveat: RCG_LLM_MODEL is the deployment name, not a model id (e.g. your
deployment of gpt-4o), and it is required — there is no default. Set
RCG_LLM_BASE_URL to override the computed <endpoint>/openai/v1 URL.
Google Vertex AI¶
Vertex AI exposes an OpenAI-compatible endpoint whose URL is built from your region and project. Auth is a short-lived Google OAuth access token, not a static API key:
export VERTEX_PROJECT=... # or GOOGLE_CLOUD_PROJECT
export RCG_LLM_REGION=us-central1 # or VERTEX_LOCATION; default us-central1
export GOOGLE_VERTEX_ACCESS_TOKEN=$(gcloud auth print-access-token)
export RCG_LLM_MODEL=google/gemini-2.5-flash # required; no default
rcg check ./rules --provider vertex
Caveat (important): the access token from
gcloud auth print-access-tokenis short-lived (~1 hour). Regenerate it for long runs, or it will expire mid-run. For production, use a service-account flow (e.g. a workload-identity or service-account credential that mints fresh tokens) and feed the resulting token intoGOOGLE_VERTEX_ACCESS_TOKEN(orRCG_LLM_API_KEY). SetRCG_LLM_BASE_URLto override the computed endpoint URL.
The same provider names work for the semantic judge — run rcg check ./rules
--semantic --provider deepseek and, when the preset's key is set, the judge uses
the OpenAI-compatible endpoint too (otherwise it falls back to the offline mock
judge). The benchmark accepts
--judge deepseek|qwen|openai|openrouter|google|bedrock|azure|vertex as well.
Caveat: structured-output reliability¶
RCG relies on forced function/tool calling for structured extraction. Hosted
endpoints (DeepSeek, Qwen, OpenAI, Amazon Bedrock's gpt-oss models) support
this well. The provider also
validates the returned arguments and retries once with an explicit nudge if
the first response has no tool call or returns incomplete/unparseable JSON.
Reliability still varies by endpoint: very weak local models may fail to emit a clean tool call even after the retry, in which case extraction raises a clear error — prefer a stronger model for the extraction step.