Politics

Developers report agents with persistent memory can repeat outdated or wrong facts

AI-Generated Summary

1 sources

1 day ago

3 views

Developers report agents with persistent memory can repeat outdated or wrong facts — Photo: Dev.to

Key Points

Persistent memory can cause agents to confidently repeat outdated information that was true when first recorded.
Errors may come from stale or “poisoned” stored facts rather than on-the-spot hallucination.
Keeping all facts indefinitely increases contradictions and reduces signal-to-noise.
One proposed mitigation is separating raw short-lived notes from a smaller curated long-term memory with explicit promotion and reconciliation.
For volatile details, adding timestamps/decay and requiring re-confirmation before taking actions reduces the impact of stale entries.

Two developers describe problems they encountered after enabling “persistent memory” in coding agents. In one case, an agent begins recalling user and project details with high confidence, but key facts have gone stale: it repeatedly cites incorrect preferences and infrastructure information, including a previous staging database region and a migration approval the user says they never gave. The author notes these errors are not typical hallucinations; instead, the agent is faithfully reusing information it recorded earlier and never corrected. The second account argues the underlying engineering challenge is less about “remembering” and more about forgetting and curation. The authors say that simply keeping everything in one long-lived memory store increases contradictions and lowers signal-to-noise, because outdated or transient details persist with equal weight alongside durable preferences and identity facts. Both accounts converge on the need for memory freshness controls, provenance and timestamps, and mechanisms that separate raw, short-lived notes from a smaller curated memory. They also emphasize reconciliation: when new facts contradict old ones, the system should overwrite or invalidate prior entries rather than accumulate conflicts. In practice, the reported mitigation involves timestamps, decay/TTL for volatile facts, promotion of decisions/preferences, pruning of superseded claims, and treating remembered items as claims to verify before acting.

How Outlets Covered This Story

DEV

Dev.to

I Gave My Agent Persistent Memory. It Remembered the Wrong 3 Things for a Week.

I gave my coding agent persistent memory in March. By the end of the week it was telling me, with total confidence, that I preferred Poetry for dependency management, that our staging database lived in us-east-1, and that I had already approved a migration plan I had explicitly rejected two days earlier. All three were wrong. I use uv. Staging moved to ap-northeast-1 back in February. And I never approved that plan. The annoying part is that none of these were hallucinations in the usual sense. The agent wasn't making things up on the spot. It was faithfully recalling facts it had written down about me at some point and never corrected. The memory worked exactly as designed. That was the problem. The pitch everyone believes The standard story about agent memory goes like this. Every session starts with a blank context window. The agent forgets you the moment the conversation ends. So you give it persistent memory: a place to write down your preferences, your project state, the decisions you've made. Next session, it reads those notes first and picks up where it left off. Anthropic shipped exactly this for all Claude users in March 2026. Claude now scans your history, synthesizes a summary of who you are and what you're working on, and refreshes it roughly every 24 hours. Letta (the framework formerly known as MemGPT) goes further: the agent edits its own memory blocks through tool calls, deciding what's worth keeping. The selling point is continuity. No more re-explaining your stack every morning. And for the first few days, it genuinely felt like working with someone who remembered me. Then it started remembering things that were no longer true. Stale memory is worse than no memory Here's the uncomfortable asymmetry. An agent with no memory asks you the same question every day. That's annoying, but it's honest. You answer, it acts, nothing rots. An agent with persistent memory answers the question itself, using a fact it learned three weeks ago. If that fact has changed, you don't get a question. You get a confident wrong action. And because the answer sounds informed, you're less likely to catch it. My us-east-1 bug is the clearest example. The agent had recorded our staging region back when it was true. We migrated. Nobody told the memory. So for a week the agent kept generating deploy commands pointed at a region with nothing in it, and every command looked perfectly reasonable because the region string was a real region we had genuinely used. This is the part the memory pitch skips. "Remembering" and "remembering correctly" are different features, and persistence only gives you the first one for free. The three ways memory goes bad After staring at my agent's memory file for an embarrassingly long evening, I sorted the failures into three buckets. They're not exotic. They show up the moment a memory system runs longer than a few days. Stale facts. Something that was true when it was written and isn't anymore. Regions, versions, deadlines, who owns which service. The world moves; the note doesn't. This was most of my pain. Poisoned facts. A wrong fact gets written once and then quoted forever. The "I approved the migration" entry came from a single ambiguous message where I said "yeah that approach makes sense" about the general shape of a plan. The agent compressed that into approval, wrote it down, and from then on treated it as settled history. No amount of arguing in later sessions dislodged it, because it kept reloading the poisoned note at the start of each one. Over-confident summaries. The Poetry thing was this. I'd mentioned Poetry once, months ago, in the context of a different repo. The summarization pass that builds the daily profile flattened "used Poetry on one old project" into "prefers Poetry." Summaries are lossy by design, and the loss tends to drift toward overconfident generalizations. The first one is a freshness problem. The second is an integrity problem. The third is a compression problem. Lumping them together as "the agent got confused" is exactly why they're hard to fix. Why self-editing memory doesn't save you The obvious response is: let the agent manage its own memory. It edits, it prunes, it corrects. Letta is built on this idea, and it's a genuinely good idea. But teams moving Letta from prototype to production keep hitting the same wall, and I hit it too: self-editing memory is unpredictable. The agent decides what to keep based on the same flawed judgment that wrote the bad fact in the first place. When I corrected the region in a session, the agent sometimes updated the memory block, sometimes wrote a second entry that contradicted the first, and once helpfully "consolidated" both into a summary that kept the wrong one. Letta even shipped a Recovery-Bench benchmark in 2026 specifically to measure how well agents climb out of corrupted states, which tells you the industry knows this is real. The deeper issue: an agent editing its own memory has no external source of truth to check against. It's grading its own homework. If it believed us-east-1 yesterday, "us-east-1" looks consistent with everything it knows today. I learned this lesson once before, the hard way, with a junior engineer I onboarded years ago. Brilliant, fast, and absolutely certain about a deployment process he'd learned on his first day. The process had changed in month two. He kept doing it the old way for weeks, confidently, because nobody handed him a reason to doubt his own notes. Persistent memory gave my agent the exact same failure mode, minus the part where a human eventually overhears the mistake at lunch. What actually started working I'm not going to pretend I solved this. But three changes cut the wrong-fact rate to something I can live with. Timestamp everything, and decay it. Every memory entry now carries when it was written. Facts about volatile things (regions, versions, deadlines) get treated as suspect after a set window and re-confirmed rather than trusted. A region string from three weeks ago isn't a fact; it's a hypothesis with an expiry date. Separate "observed" from "inferred." The Poetry disaster came from the agent storing a generalization as if it were a stated preference. Now there's a hard line: things I literally said go in one bucket, things the agent concluded go in another, and the inferred bucket needs more evidence before it gets to drive an action. Augment's framing stuck with me here: memory should guide decisions, but never be treated as infallible truth. Make corrections destructive. When I correct a fact, the old entry doesn't get a polite contradicting neighbor. It gets overwritten and logged. The audit log matters more than I expected. The first time the agent confidently cited a fact I didn't recognize, being able to see when and from which message it was written turned a mystery into a one-line fix. None of this is exotic. It's basically the discipline you'd apply to any cache: TTLs, write provenance, explicit invalidation. We just forgot to apply it to memory because the word "memory" makes it sound like something more trustworthy than a cache. It isn't. It's a cache that talks back. The uncomfortable takeaway Persistent memory doesn't make an agent reliable. It makes an agent consistent, which is a different thing, and occasionally the opposite one. A consistent agent repeats yesterday's truth whether or not it's still true today. The fix isn't more memory. It's treating every remembered fact as a claim with a timestamp and a source, not as ground truth. The agents that stay useful over weeks aren't the ones that remember the most. They're the ones that know which of their memories to distrust. If you want to go deeper on how context, state, and memory actually interact in production agents, I wrote about the full picture in Context Engineering. My agent still remembers the wrong region sometimes. But now it asks before it deploys there. That's the whole game.

1 day ago

DEV

Dev.to

The hard part of agent memory isn't remembering — it's forgetting

Everyone building agents obsesses over recall: vector stores, embeddings, RAG pipelines, bigger context windows. But after running a few long-lived agents in production, the failure mode that actually bit me wasn't "it forgot something." It was the opposite — it remembered too much, and the junk drowned the signal. An agent that never forgets isn't wise. It's a hoarder. Every stale decision, every superseded fact, every one-off detail from three weeks ago sits in its memory with equal weight, and the quality of its answers quietly degrades. The interesting engineering problem isn't storage. It's forgetting on purpose. Here's the policy I converged on. No framework — just plain Markdown files and a few rules the agent applies to itself. Why "remember everything" rots Say you log everything an agent learns into one growing store. Two things happen: Contradictions accumulate. On Monday the user says "use Postgres." On Friday they switch to SQLite. If both facts live in memory with equal standing, the agent will cheerfully cite the wrong one half the time. More memory made it less reliable. Signal-to-noise collapses. The genuinely durable facts ("this user is an engineer, ships in TypeScript, hates emoji in commit messages") get buried under transient ones ("asked about a flaky test on the 14th"). Recall returns ten things; nine are noise. Adding retrieval sophistication on top of a polluted store just lets you find the wrong thing faster. The fix is upstream: control what earns a place in long-term memory at all. The two-speed model: raw vs. curated I split memory by lifespan, and the split is what makes forgetting cheap and safe. Daily notes (memory/2026-06-22.md): append-only, lossy, never edited. This is the firehose — everything that happened today. It's allowed to be noisy because nobody reads it directly for long. Old daily notes age out naturally; they're the compost. Long-term memory (MEMORY.md): curated, small, deliberately gardened. Nothing lands here automatically. A fact has to be promoted. The whole trick is the promotion step between them. Promotion: what earns a permanent slot On idle cycles, the agent re-reads recent daily notes and asks one question of each candidate fact: will this still matter in a month? "User prefers tabs over spaces" → yes, promote. "User is debugging a timeout right now" → no, leave it in the daily note to decay. "We decided to drop the Redis cache for v2" → yes, but it replaces the older "we use Redis" line, doesn't append next to it. That last case is the important one. Promotion isn't just copying — it's reconciliation. A new durable fact that contradicts an old one overwrites it. This is how you stop the Monday/Friday Postgres problem at the source. A practical heuristic I encode in the rules: if a fact is about identity, preference, or a decision, it's a promotion candidate. If it's about current state ("right now", "today", "this ticket"), it stays transient and is allowed to die. Pruning: decay conservatively The mirror image of promotion is removal, and here the instinct most people have is wrong. When you do prune long-term memory, decay by relevance, not by volume. Don't cap MEMORY.md at N lines and evict the oldest — age is a terrible proxy for value. "User's name is Sam," learned on day one, should outlive a hundred newer-but-trivial facts. Instead, remove a long-term entry only when it's been contradicted or rendered obsolete by a newer promotion. Supersession, not a size limit. The store stays small because promotion is strict on the way in — not because you're knocking things out the back to make room. (If you ever build a graph layer on top of this, the same principle holds: decay by hops from still-relevant nodes, not by a global age threshold. A fact connected to live context stays warm even if it's old.) Recall: load the index, fetch on demand Once long-term memory is small and clean, retrieval gets boring in the best way. You don't dump the whole store into context every turn. You load a one-line index of what's known, and before answering anything about prior work or preferences, you fetch only the relevant entries. Small curated store + on-demand fetch beats giant store + fuzzy search for a single agent, every time. Why plain files, again You can audit all of this in a text editor. Want to know why the agent thinks you use SQLite? grep. Want to watch its understanding evolve? It's a git diff. A vector DB is a black box you query; a Markdown memory is a mind you can read. For a large external corpus, embed away — but the agent's own identity and curated knowledge benefit far more from being legible than from being vectorized. The uncomfortable takeaway Most "my agent has no memory" complaints are really "my agent has no curation." Storage is solved. Retrieval is solved. The unsexy, human part — deciding what's worth keeping, reconciling contradictions, letting the rest decay — is where the actual intelligence lives. It's the same discipline a person uses keeping a journal: write everything down raw, then periodically distill the few things that matter and let the rest fade. I packaged this whole approach — the file layout, the promotion/pruning rules written out as instructions an agent applies to itself, and a fully worked example agent with all the memory layers filled in so you can see a gardened mind rather than empty templates — as a drop-in kit. If you'd rather copy a working setup than derive the policy yourself: AI Soul Kit (Core ¥980 / Plus ¥3,800). But the policy above is the part that matters. Steal it.

1 day ago

Kenya High Court finds health minister in contempt over US-backed Ebola quarantine facility

Kenya’s High Court finds Health Minister Aden Duale in contempt for not complying with court orders to halt construction...

10 sources 1 day ago

Politics

Texas Roadhouse executives present global awards to Philippine team in Manila

Texas Roadhouse International executives visit Manila to present global awards to the restaurant chain’s Philippine team...

2 sources 1 hour ago

Politics

Turkey detains more than 200 in Ankara ahead of NATO summit, amid protest ban

Turkey’s authorities carry out raids and a wider crackdown in Ankara ahead of next month’s NATO summit, detaining more t...

4 sources 6 hours ago