Question 1

What does MemoSift actually do to an agent's context?

Accepted Answer

It installs as a Python SDK between the agent and its LLM. Every tool result is classified into one of 23 content types, scanned for secrets, PII, and prompt-injection patterns in under a millisecond, and — if it's over 500 characters — offloaded into object storage as a typed artifact. The agent sees a compact handle plus metadata it can act on; the bulky bytes stay reachable through ms.fetch(artifact_id) on demand.

Question 2

What is the fast vs deep recall split?

Accepted Answer

ms.recall(mode='fast') runs a hybrid SQL query (vector + BM25 text + entity-graph overlap) and a cross-encoder rerank, skipping the LLM query-analysis hop. Idle p50 is around 750 ms server-side. ms.recall(mode='deep') adds LLM-based query decomposition, entity expansion via a recursive CTE, and intent-epoch anchoring; idle p50 is around 1.1 s. The agent picks a mode per call.

Question 3

How does intent versioning work?

Accepted Answer

MemoSift captures the session's intent on every turn and versions it when the goal materially shifts. Memories are anchored to the active intent version. Recall supports three modes on the same data: free-text query, intent_version=V (retrieve using a prior epoch's intent embedding as the query), and turn=N (bi-temporal snapshot of memories active at turn N).

Question 4

Which agent frameworks does MemoSift support?

Accepted Answer

Claude Agent SDK, OpenAI Agents SDK, LangGraph, Google ADK, and LangChain have first-class adapters. There's also an MCP server for Claude Code installable with 'memosift install-claude-code', and a generic client wrapper ms.wrap(AsyncOpenAI()) that gives any Python agent full orchestration — pre-call recall, mid-call tool interception, post-call turn tracking, and auto-compression.

Question 5

How deterministic is MemoSift's compression?

Accepted Answer

Compression runs in two stages. Stage 1 is SDK-side and deterministic: strip reasoning blocks, replace large tool results with artifact stubs. Stage 2 is cloud-side and template-based — a pure SQL query assembles a knowledge block organized by intent version, no LLM paraphrase. Target latency under 50 ms. The only LLM-dependent stages are asynchronous: per-turn memory extraction and reconciliation.

Mode	Network	Mutates tool results	Provides recall	Provides compression
A — Inspector	None	No	No	No
B — Sidecar (default)	Yes	No	Yes (cross-session)	No
C — Co-pilot	Yes	Yes (≥2 KB stubbed)	Yes	Yes (in-session)

Seven layers.
One pipeline.

Three integration modes.

Inspector

Sidecar

Co-pilot

One call.
Two timelines.

Intercept & Externalize

Intercept & Externalize

Scan & Protect

Extract Memories

Track Intent

Recall on Demand

Compress Context

Audit & Comply

Seven layers.
One SDK call.

Seven layers.One pipeline.

Three integration modes.

Inspector

Sidecar

Co-pilot

One call. Two timelines.

Intercept & Externalize

Intercept & Externalize

Scan & Protect

Extract Memories

Track Intent

Recall on Demand

Compress Context

Audit & Comply

Seven layers.One SDK call.

Seven layers.
One pipeline.

One call.
Two timelines.

Seven layers.
One SDK call.