Lesson 14
How AI persists knowledge across conversations and sprints
LLMs have a fundamental limitation: context windows are finite. Even the largest models can only see a limited amount of text at once. But real software projects span months or years, accumulating decisions, conventions, architecture changes, and lessons learned.
Without memory, every conversation starts from zero. The AI forgets your tech stack, your architecture, your conventions, your past decisions. You waste time re-explaining context that should already be known.
The memory system solves this by maintaining structured, persistent files that the AI loads selectively at the start of each conversation.
The memory system distinguishes between two fundamentally different kinds of knowledge:
Stable, rarely changing facts about the project. This is the project's identity.
| File | Contains | Changes When |
|---|---|---|
.memory/semantic/project.md |
Project name, description, goals, stakeholders | Project scope changes |
.memory/semantic/architecture.md |
System architecture, layers, patterns, diagrams | Major architectural decisions |
.memory/semantic/conventions.md |
Naming rules, code style, commit format, PR conventions | Team agrees on new convention |
.memory/semantic/codebase.md |
Directory structure, key files, module descriptions | Major refactoring or new modules |
.memory/semantic/testing.md |
Test strategy, framework, coverage targets | Testing approach changes |
.memory/semantic/deployment.md |
Deployment targets, environments, CI/CD config | Infrastructure or deploy process changes |
Sprint-scoped records of events, decisions, and learnings, archived at sprint boundaries.
| File | Contains | Updated When |
|---|---|---|
.memory/episodic/decisions.md |
Architecture Decision Records (ADRs) with context and reasoning | Important technical decision made |
.memory/episodic/learnings.md |
Things discovered during development — gotchas, surprises, patterns | End of every workflow (/agile-memory-learn) |
.memory/episodic/incidents.md |
Production issues, root causes, fixes applied | After incident resolution |
.memory/episodic/context.md |
Current sprint context — active work, blockers, session continuity | During sprint work and at sprint boundaries |
Semantic memory is like a reference manual — you update it when facts change. Episodic memory is like a sprint journal — entries are appended within the current sprint, then archived to .memory/episodic/sprints/sprint_NNN.md at sprint end and the active files reset.
Always loaded (~30 lines, project summary)
What the project IS (stable)
project.md, architecture.md, conventions.md, codebase.md, testing.md, deployment.mdWhat HAPPENED (sprint-scoped, archived at sprint boundaries)
decisions.md, learnings.md, incidents.md, context.mdCurrent work items
product.md, sprint.mdThis is the only memory file loaded at the start of every conversation. It is a compact summary (~30 lines) that tells the AI what the project is about and where to find more detail.
# MEMORY_INDEX
last_verified: 2026-03-28
## Project
- Name: MyApp
- Stack: Node.js 20, TypeScript, NestJS, PostgreSQL, React
- Architecture: Clean Architecture (domain → application → infrastructure)
## Key Files
- Architecture: .memory/semantic/architecture.md
- Conventions: .memory/semantic/conventions.md
- Codebase: .memory/semantic/codebase.md
## Current Sprint
- Sprint 4: "User Management"
- Goal: Complete user CRUD, roles, and permissions
- Stories: US-041 through US-048
## Recent Decisions
- ADR-012: Chose JWT over sessions for auth (2026-03-15)
- ADR-013: PostgreSQL over MongoDB for relational data (2026-03-20)
.memory/MEMORY_INDEX.md should never exceed ~30 lines. It is loaded every time. If it grows, move details into .memory/semantic/ or .memory/episodic/ files and just reference them.
Memory is loaded in levels, not all at once. This respects context window limits and keeps the AI focused on what matters for the current task.
| Level | What Loads | When | Context Cost |
|---|---|---|---|
| Level 0: Index | .memory/MEMORY_INDEX.md only | Always, every conversation | ~30 lines |
| Level 1: Route | Files relevant to task type (see routing table) | After task is identified | ~100-300 lines |
| Level 2: Deep | Specific sections within a file | When more detail is needed | ~50-150 lines |
| Level 3: Historical | Episodic memory (decisions, learnings, incidents) | Only when explicitly relevant or asked | Varies |
| Task Type | Load These Files |
|---|---|
| Any task | .memory/semantic/project.md, .memory/semantic/conventions.md (always) |
| Backend work | .memory/semantic/architecture.md, .memory/semantic/codebase.md, .memory/semantic/testing.md |
| Frontend work | .memory/semantic/architecture.md, .memory/semantic/codebase.md, .memory/semantic/domain/design.md |
| Bug fix | .memory/semantic/codebase.md, .memory/semantic/testing.md |
| DevOps | .memory/semantic/deployment.md, .memory/semantic/architecture.md |
| Planning | .memory/semantic/architecture.md, .memory/semantic/codebase.md |
Not every conversation should update memory. The AI follows a strict protocol for when to save and when not to save.
Don't load memory files until you actually need them. If a task only involves writing a unit test, you don't need the deployment docs or the design system specs.
Before loading a full memory file, read just its frontmatter summary to decide if it's relevant:
# Step 1: Peek at frontmatter
Read first 10 lines of .memory/semantic/architecture.md
→ ---
last_verified: 2026-03-20
summary: "Clean Architecture. 3 layers. NestJS. PostgreSQL."
---
# Step 2: Decide
Is .memory/semantic/architecture.md relevant to this task?
Yes → load full file
No → skip, save context window for something else
Memory files can become stale — the code changes, but the memory doesn't. Every memory file has a last_verified date in its frontmatter.
| Condition | Score | Action |
|---|---|---|
| Verified within 7 days | Fresh | Trust as-is |
| Verified 7-30 days ago | Aging | Use but verify claims against code |
| Verified 30+ days ago | Stale | Flag to user: "This file may be outdated. Verify before relying on it." |
| Post-major-refactor | Suspect | Re-verify entire file before using |
Not everything belongs in memory. Here's how to decide:
| Belongs in Memory | Belongs in Conversation |
|---|---|
| Architecture decisions (permanent) | Current debugging session (temporary) |
| Project conventions (shared) | Specific implementation details (in-progress) |
| Sprint outcomes (historical record) | Draft code being iterated on |
| Lessons learned (reusable) | One-off questions and answers |
The memory system turns an AI with amnesia into an AI with institutional knowledge. By structuring memory into semantic (what IS) and episodic (what HAPPENED), and loading it progressively, you get the benefits of persistent context without blowing up the context window.
What is the only memory file loaded at the start of every conversation?