Lesson 14

The Memory System

How AI persists knowledge across conversations and sprints

Semantic Memory Episodic Memory MEMORY_INDEX.md Progressive Loading Staleness Detection Lazy Loading Peek-then-Load

Why AI Needs Memory

LLMs have a fundamental limitation: context windows are finite. Even the largest models can only see a limited amount of text at once. But real software projects span months or years, accumulating decisions, conventions, architecture changes, and lessons learned.

The Problem

Without memory, every conversation starts from zero. The AI forgets your tech stack, your architecture, your conventions, your past decisions. You waste time re-explaining context that should already be known.

The memory system solves this by maintaining structured, persistent files that the AI loads selectively at the start of each conversation.

🧠 Persistence — Knowledge survives across conversations and sessions
📦 Structured — Not raw chat logs, but curated, organized knowledge
🎯 Selective — Only load what's relevant to the current task
⚡ Efficient — Respects context window limits through progressive loading

Two Types of Memory

The memory system distinguishes between two fundamentally different kinds of knowledge:

Semantic Memory: What the Project IS

Stable, rarely changing facts about the project. This is the project's identity.

File	Contains	Changes When
`.memory/semantic/project.md`	Project name, description, goals, stakeholders	Project scope changes
`.memory/semantic/architecture.md`	System architecture, layers, patterns, diagrams	Major architectural decisions
`.memory/semantic/conventions.md`	Naming rules, code style, commit format, PR conventions	Team agrees on new convention
`.memory/semantic/codebase.md`	Directory structure, key files, module descriptions	Major refactoring or new modules
`.memory/semantic/testing.md`	Test strategy, framework, coverage targets	Testing approach changes
`.memory/semantic/deployment.md`	Deployment targets, environments, CI/CD config	Infrastructure or deploy process changes

Episodic Memory: What HAPPENED

Sprint-scoped records of events, decisions, and learnings, archived at sprint boundaries.

File	Contains	Updated When
`.memory/episodic/decisions.md`	Architecture Decision Records (ADRs) with context and reasoning	Important technical decision made
`.memory/episodic/learnings.md`	Things discovered during development — gotchas, surprises, patterns	End of every workflow (`/agile-memory-learn`)
`.memory/episodic/incidents.md`	Production issues, root causes, fixes applied	After incident resolution
`.memory/episodic/context.md`	Current sprint context — active work, blockers, session continuity	During sprint work and at sprint boundaries

Key Distinction

Semantic memory is like a reference manual — you update it when facts change. Episodic memory is like a sprint journal — entries are appended within the current sprint, then archived to .memory/episodic/sprints/sprint_NNN.md at sprint end and the active files reset.

Memory Architecture

.memory/MEMORY_INDEX.md

Always loaded (~30 lines, project summary)

.memory/semantic/

What the project IS (stable)

project.md, architecture.md, conventions.md, codebase.md, testing.md, deployment.md
domain/ api.md, database.md, design.md, integrations.md

.memory/episodic/

What HAPPENED (sprint-scoped, archived at sprint boundaries)

decisions.md, learnings.md, incidents.md, context.md

.memory/backlog/

Current work items

product.md, sprint.md

.memory/MEMORY_INDEX.md: The Always-Loaded File

This is the only memory file loaded at the start of every conversation. It is a compact summary (~30 lines) that tells the AI what the project is about and where to find more detail.

# MEMORY_INDEX
last_verified: 2026-03-28

## Project
- Name: MyApp
- Stack: Node.js 20, TypeScript, NestJS, PostgreSQL, React
- Architecture: Clean Architecture (domain → application → infrastructure)

## Key Files
- Architecture: .memory/semantic/architecture.md
- Conventions: .memory/semantic/conventions.md
- Codebase: .memory/semantic/codebase.md

## Current Sprint
- Sprint 4: "User Management"
- Goal: Complete user CRUD, roles, and permissions
- Stories: US-041 through US-048

## Recent Decisions
- ADR-012: Chose JWT over sessions for auth (2026-03-15)
- ADR-013: PostgreSQL over MongoDB for relational data (2026-03-20)

Keep It Small

.memory/MEMORY_INDEX.md should never exceed ~30 lines. It is loaded every time. If it grows, move details into .memory/semantic/ or .memory/episodic/ files and just reference them.

Progressive Loading Protocol

Memory is loaded in levels, not all at once. This respects context window limits and keeps the AI focused on what matters for the current task.

Level	What Loads	When	Context Cost
Level 0: Index	.memory/MEMORY_INDEX.md only	Always, every conversation	~30 lines
Level 1: Route	Files relevant to task type (see routing table)	After task is identified	~100-300 lines
Level 2: Deep	Specific sections within a file	When more detail is needed	~50-150 lines
Level 3: Historical	Episodic memory (decisions, learnings, incidents)	Only when explicitly relevant or asked	Varies

Level 1 Routing Table

Task Type	Load These Files
Any task	`.memory/semantic/project.md`, `.memory/semantic/conventions.md` (always)
Backend work	`.memory/semantic/architecture.md`, `.memory/semantic/codebase.md`, `.memory/semantic/testing.md`
Frontend work	`.memory/semantic/architecture.md`, `.memory/semantic/codebase.md`, `.memory/semantic/domain/design.md`
Bug fix	`.memory/semantic/codebase.md`, `.memory/semantic/testing.md`
DevOps	`.memory/semantic/deployment.md`, `.memory/semantic/architecture.md`
Planning	`.memory/semantic/architecture.md`, `.memory/semantic/codebase.md`

The Save Protocol

Not every conversation should update memory. The AI follows a strict protocol for when to save and when not to save.

When to Save

✅ An architecture decision was made (save to .memory/episodic/decisions.md)
✅ A new module or directory was created (update .memory/semantic/codebase.md)
✅ A workflow completed and generated learnings (append to .memory/episodic/learnings.md)
✅ A sprint boundary reached (update .memory/episodic/context.md)
✅ A convention was agreed upon (update .memory/semantic/conventions.md)

When NOT to Save

🚫 Exploratory conversation (just asking questions)
🚫 Temporary debugging (not a permanent learning)
🚫 Minor code changes that don't affect architecture
🚫 Information that's already in version control (don't duplicate git log)

Lazy Loading & Peek-then-Load

Lazy Loading

Don't load memory files until you actually need them. If a task only involves writing a unit test, you don't need the deployment docs or the design system specs.

Peek-then-Load

Before loading a full memory file, read just its frontmatter summary to decide if it's relevant:

# Step 1: Peek at frontmatter
Read first 10 lines of .memory/semantic/architecture.md

→ ---
  last_verified: 2026-03-20
  summary: "Clean Architecture. 3 layers. NestJS. PostgreSQL."
  ---

# Step 2: Decide
Is .memory/semantic/architecture.md relevant to this task?
  Yes → load full file
  No  → skip, save context window for something else

Staleness Detection

Memory files can become stale — the code changes, but the memory doesn't. Every memory file has a last_verified date in its frontmatter.

Staleness Scoring

Condition	Score	Action
Verified within 7 days	Fresh	Trust as-is
Verified 7-30 days ago	Aging	Use but verify claims against code
Verified 30+ days ago	Stale	Flag to user: "This file may be outdated. Verify before relying on it."
Post-major-refactor	Suspect	Re-verify entire file before using

Memory vs Conversation Context

Not everything belongs in memory. Here's how to decide:

Belongs in Memory	Belongs in Conversation
Architecture decisions (permanent)	Current debugging session (temporary)
Project conventions (shared)	Specific implementation details (in-progress)
Sprint outcomes (historical record)	Draft code being iterated on
Lessons learned (reusable)	One-off questions and answers

Example: How Memory Flows Through a Sprint

🏁

Sprint Start Load INDEX + backlog + architecture. @sm plans sprint.

💻

During Sprint Each task: Load L0+L1, implement, learn, update map if needed. Task-specific memory.

📝

Sprint End Save learnings, update INDEX, append sprint log, update backlog, retro notes.

Key Takeaway

The memory system turns an AI with amnesia into an AI with institutional knowledge. By structuring memory into semantic (what IS) and episodic (what HAPPENED), and loading it progressively, you get the benefits of persistent context without blowing up the context window.

Knowledge Check

What is the only memory file loaded at the start of every conversation?

← Lesson 13: AI & LLM Engineering Lesson 15: Putting It All Together →