Context memory now live!

TL:DR: we've added context memory which gives infinite memory/context size to any model and improves recall, speed, and performance.

This is an amazing feature for programming and agentic purposes, and can be used with any model.

When conversations get long, models become slow, lose track, or error out entirely.

Context Memory keeps conversations and coding sessions snappy and allows them to continue indefinitely while maintaining full awareness of the entire conversation history.

The Problem

Current memory solutions like ChatGPT's memory store general facts but miss something critical: the ability to recall specific events at the right level of detail.

This means:

- Important details forgotten or lost during summarization

- Conversations cut short when context limits are reached

- AI agents that lose track of their previous work

How Context Memory Works

Context Memory creates a hierarchical structure of your conversation:

- High-level summaries for overall context

- Mid-level details for important relationships

- Specific details when relevant to recent messages

Here's an example from a coding session:

Token estimation function refactoring

|-- Initial user request

|-- Refactoring to support integer inputs

|-- Error: "exceeds the character limit"

| +-- Fixed by changing test params from strings to integers

+-- Variable name refactoring

When you ask "What errors did we encounter?", Context Memory expands the relevant section while keeping other parts collapsed.

The model that you're using (like ChatGPT or Claude) gets the exact detail needed without information overload.

Benefits

Developers:

- Long coding sessions without losing context

- Speed! Compresses long histories so your model responds quickly

- AI agents that learn from past mistakes

- Documentation that maintains context across entire codebases

Agents:

- Long-running agents keep everything in memory, from the very first step to the current status

- Use any model for agents with effectively infinite memory — Context Memory stores all history and passes only the relevant bits so the model always stays aware

- Reliable planning and backtracking with preserved goals, constraints, decisions, and outcomes

- Tool use and multi-step workflows stay coherent across hours, days or weeks, including retries and branches

- Resume after interruptions with full state awareness, without hitting context window limits

Roleplay:

- Build far bigger worlds with persistent lore, timelines, and locations that never get forgotten

- Characters remember identities, relationships, and evolving backstories across long arcs

- Branching plots stay coherent—past choices, clues, and foreshadowing remain available

- Resume sessions after days or weeks with full awareness of what happened at the very start

- Epic-length narratives without context limits—only the relevant pieces are passed to the model

Conversations:

- Extended discussions without forgetting details

- Research projects that build knowledge over time

- Complex problem-solving with full history awareness

Turn on Context Memory as early as possible in your conversation!

Context Memory progressively indexes your conversation each time, building up a comprehensive understanding. This means:

- Starting memory early captures your entire conversation history

- You can go over 1 million tokens without hitting limits

- The system compresses intelligently, enabling it to return the most relevant information

- Later messages benefit from the full context built up over time

The earlier you enable it, the more complete your memory will be.

Using Context Memory

Simple. Add :memory to any model name.

Or pass a header:

memory: true.

Or on our frontend, just check "enable context memory".

Retention

By default, Context Memory retains your compressed chat state for 30 days.

Retention is rolling and based on the conversation’s last update: each new message resets the timer, and the thread expires N days after its last activity.

You can configure retention from 1 to 365 day.

How It Works

- You send your full conversation history to our API

- Context Memory compresses this into a compact representation with all relevant information

- Only the compressed version is sent to the AI model (OpenAI, Anthropic, etc.)

- The model receives all the context it needs without hitting token limits

This means you can have conversations with millions of tokens of history, but the AI model only sees the intelligently compressed version that fits within its context window.

Provider: Polychat

When using Context Memory, your conversation data is processed by Polychat's API which uses Google/Gemini in the background with maximum privacy settings.

You can review Polychat's full privacy policy at https://polychat.co/legal/privacy.

Important privacy details:

- Context Memory over the API does not send data to Google Analytics or use cookies

- Only your conversation messages are sent to Polychat for compression

- No email, IP address, or other metadata is shared, only the prompts

Pricing

- Not-cached input: $5.00 per million tokens

- Cached input: $2.50 per million tokens

- Output Generation: $10.00 per million tokens

Retention: 30 days by default; configurable 1–365 days via :memory- or memory_expiration_days header

Typical Usage: 8k-20k tokens per session

That's all! Go try it out, and let us know what you think.

Reply to this note

Please Login to reply.

Discussion

No replies yet.