Conversational Recall Lane
Package matrix/neo/internal/recall is Neo's conversational recall lane: the read-side that surfaces the most RELEVANT past turns of a (now unbounded) conversation thread — reach...
Package matrix/neo/internal/recall is Neo's conversational recall lane: the read-side that surfaces the most RELEVANT past turns of a (now unbounded) conversation thread — reaching PAST what the live working transcript and the resume seed already hold.
Source file: neo/internal/recall/recall.go.
Design decisions
Relevance over raw recency. The live transcript (RAM) and the 16-turn resume seed cover RECENT context. Once a turn scrolls out of RAM or is compacted into the lossy summary, the verbatim original is unreachable. This lane pulls a specific, relevant OLD turn back by semantic similarity.
In-memory, lazy, incremental. Turns are embedded + cosine-ranked in memory, recomputed lazily from the durable conversation store on demand. Only turns appended since the last call are embedded, so steady-state cost is one embed per new turn.
Disposable index. The index never touches cortex or the replay chain. It is a derivable, disposable side-channel — preserving the conversation side-channel invariant.
Best-effort degradation. Any embed failure degrades to fewer/no hits rather than erroring. The RAM tier + recent tail still carry the turn.
Recaller
type Recaller struct {
conv *conversation.Store
convID string
emb embed.Embedder
topK int
budget int // token ceiling
mu sync.Mutex
cache []turnVec // embedded turns, thread order
embedded int // count already folded into cache
}
recaller := recall.New(conv, convID, embedder, topK, budgetTokens)
A nil embedder or disabled store yields a safe no-op recaller (Relevant returns nil).
Relevant
func (r *Recaller) Relevant(ctx context.Context, queryText string) []Hit
Returns up to topK past turns most similar to queryText, ranked by cosine similarity and bounded by the token budget.
type Hit struct {
Role string // "user" | "assistant"
Text string // verbatim turn text
}
Algorithm
- Refresh — embed any turns appended since the last call
- Embed query —
emb.Embed(queryText) - Score all turns — cosine similarity between query vector and each turn vector
- Sort — descending by score
- Budget — accumulate turns until
topKor token budget exhausted (always returns ≥1 if any exist)
Deduplication
The caller (agent) is expected to drop any hit already present in the live transcript. This lives at the agent boundary because the recaller doesn't know what's currently in the working window.
Integration
Each session gets its own recaller:
var recaller agent.ConvRecaller
if e.conv.Enabled() && e.pager != nil {
if emb := e.pager.Embedder(); emb != nil {
recaller = recall.New(e.conv, convID, emb, cfg.RecallTopK, cfg.RecallBudgetTokens)
}
}
s.agent = agent.New(agent.Options{
// ...
Recaller: recaller,
})
The agent injects recalled turns into the system block via renderRecall(), deduped against the live transcript:
Relevant earlier in this conversation (the live exchange below is more current — it wins on any conflict):
- User: how do I deploy an ERC-20
- Neo: call paxeer-net deploy_token...
Modifying recall
| What to change | Where |
|---|---|
| Default topK | recall/recall.go — constructor default |
| Default budget | recall/recall.go — constructor default |
| Similarity metric | recall/recall.go — embed.Cosine() |
| Budget heuristic | recall/recall.go — (len(text) + 3) / 4 |
