LLM Client
Package matrix/mcl/llm implements the interpreter.LLM interface over OpenAI-compatible chat-completion endpoints. It's stdlib-only — no third-party SDK.
Package matrix/mcl/llm implements the interpreter.LLM interface over OpenAI-compatible chat-completion endpoints. It's stdlib-only — no third-party SDK.
Source files: MCL/llm/llm.go, MCL/llm/model.go, MCL/llm/identity.go, MCL/llm/messages_api.go, MCL/llm/responses_api.go.
Providers
Three providers are supported:
| Provider | Constant | Default endpoint |
|---|---|---|
| Together AI | ProviderTogether | https://api.together.xyz/v1/chat/completions |
| Fireworks AI | ProviderFireworks | https://api.fireworks.ai/inference/v1/chat/completions |
| Opencode | ProviderOpencode | https://opencode.ai/zen/v1/... (route depends on model) |
Provider is auto-detected from the model string via llm.DetectProvider(). You can override it explicitly via Config.Provider.
Environment variables used for API keys:
- Together:
TOGETHER_API_KEY - Fireworks:
FIREWORKS_API_KEY - Opencode:
OPENCODE_API_KEY
Config.APIKey overrides the env var lookup.
API shapes
Three wire shapes are supported:
| Shape | Constant | Endpoint suffix |
|---|---|---|
| OpenAI chat completions | ShapeChatCompletions | /v1/chat/completions |
| Anthropic messages | ShapeMessages | /v1/messages |
| OpenAI responses | ShapeResponses | /v1/responses |
The shape is auto-detected from the endpoint URL via llm.DetectAPIShape(endpoint). The messages_api.go and responses_api.go files implement the Anthropic and OpenAI Responses API shapes respectively; llm.go handles the default chat completions path.
For the compiler slot, you're always on ShapeChatCompletions — Together and Fireworks both speak this. The other shapes are used by Neo's LLM routing for frontier models routed through opencode.
Config
type Config struct {
Model string // provider-specific model identifier
Provider Provider // override auto-detection
APIKey string // override env var
Endpoint string // override default endpoint
Temperature float64 // 0 = deterministic (compiler default)
Seed int64 // D11 seed (0 = no seed param sent)
MaxTokens int // 0 = defaults to 4096
Timeout time.Duration // 0 = defaults to 90s
GrammarMode GrammarMode
Grammars map[string]*GrammarDef
}
Grammar modes
const (
GrammarModeNone GrammarMode = iota // no constraint
GrammarModeResponseJSON // response_format: {type: "json_object"}
GrammarModeResponseSchema // response_format: {type: "json_schema", json_schema: {...}}
GrammarModeFireworksGBNF // Fireworks grammar= EBNF parameter
)
Grammar constraints are how the compiler forces the LLM to produce structurally valid output. The provider determines which mode is available:
- Together AI supports
response_format.json_schema(mode:GrammarModeResponseSchema) - Fireworks supports both JSON schema and native EBNF (
grammar=param, mode:GrammarModeFireworksGBNF)
The grammar ID (e.g. "intent_frame@1") is passed from RunInput.Grammar through to Decode. The Grammars map in Config resolves it to the actual constraint payload.
type GrammarDef struct {
ID string // e.g. "intent_frame@1"
JSONSchema []byte // JSON Schema bytes (for response_format.json_schema)
GBNF string // EBNF grammar string (for Fireworks grammar= param)
}
Creating a client
cfg := llm.DefaultCompilerModel()
// cfg.Model is a fast, seedable, grammar-constrained model
// cfg.Temperature is 0 (deterministic)
// cfg.Seed is 42 (default)
client, err := llm.New(&cfg)
if err != nil {
// API key not found — fall back to dry-run
}
// Implements interpreter.LLM
var _ interpreter.LLM = client
llm.New returns an error if the API key is missing. The caller decides whether to fall back to dry-run or propagate the error. The mclc compile command logs a warning and falls back.
llm.DefaultCompilerModel() returns a config for the project's default fast-seedable compiler model. Check MCL/llm/model.go for the current default. This is intentionally not hardcoded here because it changes as better options appear.
Calling the LLM
messages := []interpreter.Message{
{Role: "system", Content: "You are a frame extractor..."},
{Role: "user", Content: "Goal: build a deployment pipeline"},
}
output, err := client.Decode(ctx, messages, "intent_frame@1")
The grammar argument is resolved against cfg.Grammars. If it's not in the map, the call proceeds without a grammar constraint (same as passing "").
For streaming:
if streamer, ok := client.(interpreter.StreamingLLM); ok {
output, err := streamer.Stream(ctx, messages, "intent_frame@1", func(delta string) {
// incremental token — send to UI
})
} else {
output, err = client.Decode(ctx, messages, "intent_frame@1")
}
Stream must return the same final text as Decode would have. The canonical output — and therefore the D11 hash — is always the full accumulated text. Streaming is a UX layer, not a semantic one.
Model registry
MCL/llm/model.go contains the ModelRegistry — a mapping from model identifiers to routing metadata (which provider, which API shape, whether seedable, whether grammar-constrained). This is what the executor uses to pick the right model for a given StepPayload.Kind.
The registry is the source of truth for which models are available in which roles. Adding a new model means adding an entry here and (for prod metering) a rate-card entry in gateway/internal/rates/rates.go.
The step kind → model routing:
| Step kind | Model tier | Notes |
|---|---|---|
reason | Default (GLM-5.1 fast) | General agentic reasoning |
code | Code specialist | Code generation / analysis |
summarize | Long-context specialist | Summarization over long context windows |
write | Prose specialist | Free-form writing |
transform | Deterministic structured I/O | JSON→JSON transformations |
classify | Fast grammar-constrained | Pick-from-list, classifier steps |
hard_reason | Frontier reasoning (expensive) | Multi-step reasoning, planning |
The model registry keeps AllStepKindNames in sync with ir.StepKindNames. There's a test in the executor that guards against drift between the two — if you add a kind to one, you must add it to the other.
Identity
MCL/llm/identity.go provides llm.Identity — a per-invocation identity that flows through LLM calls for attribution and metering. The Matrix gateway routes LLM calls through the X-Matrix-Actor-DID header, and the Identity struct provides the values that populate it.
Dry-run mode
Any code that constructs an interpreter.Interpreter with llm=nil runs in dry-run mode. The interpreter builds and interpolates prompts exactly as it would for a real call, but returns without calling the LLM. RunResult.FrameJSON is empty; RunResult.PromptMessages is fully populated.
This is useful for:
- Testing
.mtxfiles without an API key - Displaying what the compiler would have sent to the LLM
- CI validation of skill files
mclc compile -dry-run uses this mode explicitly.
