Compress.
Secure.
Remember.
Comply.
The Dual-Path Architecture
Fast sync path returns in <50ms. Async LLM path extracts memories in parallel — non-blocking. Every decision logged in the CompressionReport.
Scans every message for secrets (25+ patterns), PII, and prompt injections. Three modes: flag (metadata only), redact (replace with placeholders), block (reject). Runs before any LLM sees the content.
SHA-256 exact dedup. If the same tool result appears in turn 1 and turn 5, the duplicate is removed. Tool call integrity preserved — orphan tool calls/results cleaned up.
Tags each message with content type (SYSTEM, USER, ASSISTANT, TOOL_JSON, TOOL_TEXT, TOOL_ERROR) and recency. Recent turns pass through untouched.
Old tool results replaced with compact pointers or summaries. First turn: pointer stub. Next turn: summary from cache replaces the pointer.
Prepends a ~200 token session context block summarizing intent, progress, key findings, and available memories. Updated by LLM every 3 turns.
Two parallel GPT-5.4-nano calls: one extracts facts (max 6, with tags + importance), one extracts entity relationships. Summary generated deterministically from facts. ~1.5s wall clock.
Each memory checked against existing store. Four outcomes: ABSORB (duplicate, skip), REFINE (more specific, update), CONTRADICT (conflicts, supersede old), NEW (novel, store).
Memories stored in Neon DB with pgvector embeddings. Relationships stored in Neo4j. Summaries cached in Redis. Session context updated. Higher-tier synthesis every 5 turns.
Three-Tier Memory System
Memories flow upward through three tiers — from individual facts to session insights to persistent project knowledge. Each tier consolidates and distills the one below.
Facts extracted every turn
Tags, importance scores, entity references. Stored immediately. Searchable via text, tag, or semantic query.
app-service uses 3 replicas
deploymentDATABASE_URL contains hardcoded credential
security_issueSynthesized every 5 turns
Higher-level patterns and cross-turn insights. 20+ turn-level memories condensed into 3-5 session insights.
Multiple services have hardcoded credentials
patternKubernetes deployments use inconsistent replica counts
findingCreated at session close
Cross-session knowledge that persists. Domain patterns, architectural decisions, team conventions.
Team convention: All secrets must use Vault references
conventionStandard deployment: 3 replicas with rolling update
standardRELATIONSHIP GRAPH
Neon DB (pgvector)
Semantic search
Neo4j
Graph traversal
Redis
Summary cache
Every message scanned. Every decision audited.
Security scans run BEFORE memory extraction. Redacted content never reaches the LLM. Every memory classified: public, internal, or redacted.
Secrets Detection
Pattern-based detection of 25+ credential types — AWS keys, GitHub tokens, JWTs, Stripe keys, database URLs, SSH keys, Bearer tokens. Supplementary entropy analysis catches variants.
PII Detection & Redaction
Emails, phone numbers, SSNs, credit cards, IP addresses, medical record IDs. Type-preserving redaction: if john@example.com appears 3 times, all become [EMAIL_1].
Prompt Injection Detection
Scans for instruction overrides, role changes, hidden Unicode, Base64 payloads, HTML comment injection, and token smuggling in tool results.
Context Integrity
SHA-256 hash verification of system prompts and tool call integrity. Orphan tool calls/results automatically cleaned up. Tamper-evident chain detects unauthorized mutations.
Compliance Policies
HIPAA, PCI-DSS, SOX, GDPR templates. Override compression behavior per content type: preserve_verbatim, minimum_retention_90, flag_if_compressed, never_compress.
Immutable Audit Trail
Append-only decision log with chain hashes. Context reconstruction answers: ‘What did the AI see when it made decision X?’ Export as JSON, Markdown, or PDF.
Agents that remember. Systems that learn.
MemoSift transforms ephemeral conversation context into structured persistent memory — and gives you full observability into how your agents use that knowledge.
Persistent Memory
Every tool result processed by GPT-5.4-nano. Facts extracted with tags and importance scores. Consolidated against existing knowledge — duplicates absorbed, contradictions resolved.
Relationship Graph
Entity relationships stored in Neo4j. Services, files, secrets, technologies connected via typed edges: depends_on, has_secret, deployed_on, configured_in.
Semantic Search
Memories searchable via text, tags, or semantic similarity (pgvector embeddings). Recall endpoint returns ranked results with optional relationship graph traversal.
Session Context
~200 token LLM-generated context block prepended every turn. Summarizes intent, progress, key findings, and memory count. Updated every 3 turns.
Dynamic Memory Tags