Your agent forgets past decisions and burns tokens re-reading the same context. Memory Stack runs 5 search engines locally, returns only what matters, and never loses a fact. Bring your own API key to unlock LLM-powered fact extraction for the complete experience.
Every wasted token is money burned. Memory Stack eliminates the waste.
Native memory dumps full text every time. Memory Stack gives you three tiers: L0 auto-recall at ~100 tokens, L1 summaries at ~800, L2 full content on demand. 90% fewer tokens per search — your agent gets exactly what it needs, nothing more.
One search fires 5 engines in parallel — full-text, semantic, markdown, fact store, and compressed history. Results merge with rank fusion and diversity reranking. Right answer on the first try. No wasted tokens chasing wrong context.
When conversations get long, compression eats old messages. Memory Stack extracts key facts — decisions, deadlines, requirements — into a dedicated store before they disappear. Your agent recalls them instantly instead of you re-explaining. Zero wasted tokens.
Flat text search forces your agent to re-read everything to find connections. Memory Stack automatically tracks entities and their relationships — who changed what, what depends on what, how things evolved. Queryable on demand, not buried in old conversations.
Duplicates and junk cost real money every time your agent reads them. 4-level deduplication (exact, normalized, substring, cosine) runs automatically. Health score 0-100 shows exactly what's wasting tokens. Your memory stays lean, your bill stays flat.
Core search runs offline out of the box. Add any LLM API key — OpenAI, Anthropic, Ollama, MLX, or any OpenAI-compatible endpoint — and Memory Stack auto-detects it. LLM-powered fact extraction kicks in: every conversation produces structured decisions, deadlines, and entities stored permanently. A few cents per session saves dollars of wasted tokens. Your key, your choice of provider.
One curl command installs, registers, and restarts OpenClaw. Updates happen automatically in the background.
Same conversations. One remembers everything at 10% of the token cost.
| Native Memory | Memory Stack | |
|---|---|---|
| Remembers longer | ||
| What happens when the conversation gets too long? | Old messages get compressed. Decisions disappear. | Key facts are saved before compression and brought back automatically. |
| Can it remember things from last week? | Only if it's still in the conversation window. | Yes. Recent memories rank higher, but older ones are still searchable. |
| Does it understand how things connect? | No. It searches text, not relationships. | Yes. Entity tracking links people, tools, and decisions — queryable on demand. |
| Can it trace how a decision evolved? | No. | Yes. Evolution chains link past decisions to current ones. |
| Saves you money | ||
| How much does each memory search cost? | Loads full text every time. More tokens, higher API bill. | Loads a summary first. Only fetches full text when needed. Uses up to 90% fewer tokens. |
| Does it waste money on duplicate results? | Can feed the same info to your AI twice. You pay for both. | Removes duplicates before sending anything to the AI. You only pay once. |
| Does the cost grow over time? | Memory piles up. More junk = more tokens = higher cost. | Auto-cleanup merges similar memories. Stays lean, cost stays flat. |
| Finds things faster | ||
| How many search methods run per query? | 2 (keyword + vector) | 5 engines, merged with rank fusion and per-engine weights. |
| Does it understand what you meant, not just what you typed? | Basic keyword matching. | Query expansion rewrites your question locally before searching — no API call needed. |
| Can it search across past conversations? | Limited. | Dedicated fact store and entity tracking — finds facts across all conversations instantly. |
| Can you check if your memory is healthy? | No. | Quality score 0-100. Shows duplicates, stale entries, noise. |
| How much context does each recall use? | Full text every time. No control over token usage. | Tiered output: L0 auto-recall uses ~100 tokens. L1 summaries ~800. Full text only on demand. |
| Does it need API keys or cloud services? | Vector search needs an embedding provider. | Core search runs offline. Bring your own LLM key (OpenAI, Anthropic, Ollama, MLX — any provider) to unlock structured fact extraction from every conversation. Full experience with your key. |
Most memory skills do one thing. You end up installing 3-4 and hoping they work together.
| What you need | Other skills | Memory Stack |
|---|---|---|
| Find a function name | Vector search misses exact names | Full-text keyword search finds it instantly |
| Find "how does auth work" | Vector search works | Semantic search with query expansion |
| Search across 5 conversations | Limited to current context | Fact store + entity tracking |
| Control token spend | Full text every time | 3 tiers: ~100 / ~800 / full |
| Remove duplicates | Manual cleanup | 4-level auto-dedup |
| Track decision evolution | No history | Evolution tracking across conversations |
| Check memory quality | No tooling | Health score 0-100 |
| Work offline | Needs OpenAI key | Core search runs offline |
A drop-in OpenClaw plugin that replaces built-in memory. 5 search engines with rank fusion, entity tracking, and 3-tier token control. Your agent recalls more while using up to 90% fewer tokens. Core search and memory run locally — add your own LLM key for enhanced fact extraction.
5 engines fire in parallel with automatic fallback. Results merge with rank fusion and diversity reranking. Entities and relationships are tracked automatically and queryable on demand. Tiered output controls exactly how many tokens each recall costs. Add your own LLM key (OpenAI, Anthropic, Ollama, MLX, or any compatible provider) for structured fact extraction — the complete Memory Stack experience.
Yes. Memory Stack plugs into OpenClaw as a native memory provider. It works with Telegram, CLI, and any other OpenClaw channel. No extra configuration needed — one command and it's live.
Core search, rank fusion, deduplication, and entity tracking all run locally. No data leaves your machine. For enhanced fact extraction, add your own LLM key — supports OpenAI, Anthropic, Ollama, MLX, and any compatible endpoint. Auto-detected at startup. Without a key, core search still works fully offline. Update checks run in the background and fail silently.
Every token your AI reads costs money. Memory Stack cuts that three ways: (1) Tiered output — auto-recall uses ~100 tokens, on-demand search uses ~800 tokens, full text only loads when requested. Up to 90% fewer tokens per search. (2) Duplicate removal so you don't pay for the same information twice. (3) Compressed history — your agent drills down only when it needs detail.
When OpenClaw conversations get long, the system compresses old messages to fit the context window. Important decisions can get lost in this process. Memory Stack extracts key facts (decisions, deadlines, architecture choices) into a dedicated store before compression happens, and retrieves them instantly when relevant — zero wasted tokens re-explaining things.
No. One-time purchase of $49. You own the code. No recurring fees, no SaaS, no data collection. Just files that live on your machine.
Yes. Memory Stack checks for new versions automatically when it starts up. When an update is available, you'll see a one-line prompt — run the command and you're done. Bug fixes within your version are always free. No manual checking, no update subscriptions.