Lesson 14

The Memory System

How AI persists knowledge across conversations and sprints

Semantic Memory Episodic Memory MEMORY_INDEX.md Progressive Loading Staleness Detection Lazy Loading Peek-then-Load

Why AI Needs Memory

LLMs have a fundamental limitation: context windows are finite. Even the largest models can only see a limited amount of text at once. But real software projects span months or years, accumulating decisions, conventions, architecture changes, and lessons learned.

The Problem

Without memory, every conversation starts from zero. The AI forgets your tech stack, your architecture, your conventions, your past decisions. You waste time re-explaining context that should already be known.

The memory system solves this by maintaining structured, persistent files that the AI loads selectively at the start of each conversation.

Two Types of Memory

The memory system distinguishes between two fundamentally different kinds of knowledge:

Semantic Memory: What the Project IS

Stable, rarely changing facts about the project. This is the project's identity.

File Contains Changes When
.memory/semantic/project.md Project name, description, goals, stakeholders Project scope changes
.memory/semantic/architecture.md System architecture, layers, patterns, diagrams Major architectural decisions
.memory/semantic/conventions.md Naming rules, code style, commit format, PR conventions Team agrees on new convention
.memory/semantic/codebase.md Directory structure, key files, module descriptions Major refactoring or new modules
.memory/semantic/testing.md Test strategy, framework, coverage targets Testing approach changes
.memory/semantic/deployment.md Deployment targets, environments, CI/CD config Infrastructure or deploy process changes

Episodic Memory: What HAPPENED

Sprint-scoped records of events, decisions, and learnings, archived at sprint boundaries.

File Contains Updated When
.memory/episodic/decisions.md Architecture Decision Records (ADRs) with context and reasoning Important technical decision made
.memory/episodic/learnings.md Things discovered during development — gotchas, surprises, patterns End of every workflow (/agile-memory-learn)
.memory/episodic/incidents.md Production issues, root causes, fixes applied After incident resolution
.memory/episodic/context.md Current sprint context — active work, blockers, session continuity During sprint work and at sprint boundaries
Key Distinction

Semantic memory is like a reference manual — you update it when facts change. Episodic memory is like a sprint journal — entries are appended within the current sprint, then archived to .memory/episodic/sprints/sprint_NNN.md at sprint end and the active files reset.

Memory Architecture

.memory/MEMORY_INDEX.md

Always loaded (~30 lines, project summary)

.memory/semantic/

What the project IS (stable)

project.md, architecture.md, conventions.md, codebase.md, testing.md, deployment.md
domain/ api.md, database.md, design.md, integrations.md

.memory/episodic/

What HAPPENED (sprint-scoped, archived at sprint boundaries)

decisions.md, learnings.md, incidents.md, context.md

.memory/backlog/

Current work items

product.md, sprint.md

.memory/MEMORY_INDEX.md: The Always-Loaded File

This is the only memory file loaded at the start of every conversation. It is a compact summary (~30 lines) that tells the AI what the project is about and where to find more detail.

# MEMORY_INDEX
last_verified: 2026-03-28

## Project
- Name: MyApp
- Stack: Node.js 20, TypeScript, NestJS, PostgreSQL, React
- Architecture: Clean Architecture (domain → application → infrastructure)

## Key Files
- Architecture: .memory/semantic/architecture.md
- Conventions: .memory/semantic/conventions.md
- Codebase: .memory/semantic/codebase.md

## Current Sprint
- Sprint 4: "User Management"
- Goal: Complete user CRUD, roles, and permissions
- Stories: US-041 through US-048

## Recent Decisions
- ADR-012: Chose JWT over sessions for auth (2026-03-15)
- ADR-013: PostgreSQL over MongoDB for relational data (2026-03-20)
Keep It Small

.memory/MEMORY_INDEX.md should never exceed ~30 lines. It is loaded every time. If it grows, move details into .memory/semantic/ or .memory/episodic/ files and just reference them.

Progressive Loading Protocol

Memory is loaded in levels, not all at once. This respects context window limits and keeps the AI focused on what matters for the current task.

Level What Loads When Context Cost
Level 0: Index .memory/MEMORY_INDEX.md only Always, every conversation ~30 lines
Level 1: Route Files relevant to task type (see routing table) After task is identified ~100-300 lines
Level 2: Deep Specific sections within a file When more detail is needed ~50-150 lines
Level 3: Historical Episodic memory (decisions, learnings, incidents) Only when explicitly relevant or asked Varies

Level 1 Routing Table

Task TypeLoad These Files
Any task.memory/semantic/project.md, .memory/semantic/conventions.md (always)
Backend work.memory/semantic/architecture.md, .memory/semantic/codebase.md, .memory/semantic/testing.md
Frontend work.memory/semantic/architecture.md, .memory/semantic/codebase.md, .memory/semantic/domain/design.md
Bug fix.memory/semantic/codebase.md, .memory/semantic/testing.md
DevOps.memory/semantic/deployment.md, .memory/semantic/architecture.md
Planning.memory/semantic/architecture.md, .memory/semantic/codebase.md

The Save Protocol

Not every conversation should update memory. The AI follows a strict protocol for when to save and when not to save.

When to Save

When NOT to Save

Lazy Loading & Peek-then-Load

Lazy Loading

Don't load memory files until you actually need them. If a task only involves writing a unit test, you don't need the deployment docs or the design system specs.

Peek-then-Load

Before loading a full memory file, read just its frontmatter summary to decide if it's relevant:

# Step 1: Peek at frontmatter
Read first 10 lines of .memory/semantic/architecture.md

→ ---
  last_verified: 2026-03-20
  summary: "Clean Architecture. 3 layers. NestJS. PostgreSQL."
  ---

# Step 2: Decide
Is .memory/semantic/architecture.md relevant to this task?
  Yes → load full file
  No  → skip, save context window for something else

Staleness Detection

Memory files can become stale — the code changes, but the memory doesn't. Every memory file has a last_verified date in its frontmatter.

Staleness Scoring

Condition Score Action
Verified within 7 days Fresh Trust as-is
Verified 7-30 days ago Aging Use but verify claims against code
Verified 30+ days ago Stale Flag to user: "This file may be outdated. Verify before relying on it."
Post-major-refactor Suspect Re-verify entire file before using

Memory vs Conversation Context

Not everything belongs in memory. Here's how to decide:

Belongs in Memory Belongs in Conversation
Architecture decisions (permanent) Current debugging session (temporary)
Project conventions (shared) Specific implementation details (in-progress)
Sprint outcomes (historical record) Draft code being iterated on
Lessons learned (reusable) One-off questions and answers

Example: How Memory Flows Through a Sprint

🏁
Sprint Start Load INDEX + backlog + architecture. @sm plans sprint.
💻
During Sprint Each task: Load L0+L1, implement, learn, update map if needed. Task-specific memory.
📝
Sprint End Save learnings, update INDEX, append sprint log, update backlog, retro notes.
Key Takeaway

The memory system turns an AI with amnesia into an AI with institutional knowledge. By structuring memory into semantic (what IS) and episodic (what HAPPENED), and loading it progressively, you get the benefits of persistent context without blowing up the context window.

Knowledge Check

What is the only memory file loaded at the start of every conversation?