Lesson 13

AI & LLM Engineering

Prompt engineering, AI agents, and AI-augmented Scrum for modern development

Large Language Models Prompt Engineering AI Agents Tool Use AI-Augmented Scrum Context Windows Role-Based Agents

What Are Large Language Models (LLMs)?

Large Language Models are neural networks trained on vast amounts of text data. They predict the next token (word fragment) in a sequence, but this simple objective gives rise to remarkable capabilities: reasoning, code generation, analysis, and creative problem solving.

Core Architecture: Transformers

Modern LLMs are built on the Transformer architecture (introduced in the 2017 paper "Attention Is All You Need"). The key innovation is the attention mechanism — the model can look at all parts of the input simultaneously, understanding relationships between distant words.

Key Concepts

Concept	What It Means	Why It Matters
Token	A word fragment (roughly 3/4 of a word)	Models process and generate tokens, not words. "unbreakable" = ["un", "break", "able"]
Context Window	Maximum number of tokens the model can see at once	Determines how much code, documentation, or conversation the model can work with
Parameters	The learned weights of the neural network	More parameters generally means more capability (GPT-4 class: hundreds of billions)
Temperature	Controls randomness of output (0 = deterministic, 1 = creative)	Low for code generation, higher for brainstorming

How LLMs Are Built

📚

Pre-training Train on massive text corpus. Learn language, facts, patterns. Billions of tokens.

🎯

Fine-tuning Train on curated instruction data. Learn to follow instructions. Thousands of examples.

👥

RLHF Human feedback ranks outputs. Learn to be helpful & safe. Thousands of comparisons.

Key LLM Capabilities

🧠 Reasoning — Multi-step logical deduction, planning, problem decomposition
💻 Code Generation — Write, debug, refactor, and explain code in any language
🔍 Analysis — Review code for bugs, security issues, performance bottlenecks
📝 Summarization — Distill long documents, conversations, or codebases into key points
🔄 Translation — Between languages, between code and natural language, between formats

Prompt Engineering Fundamentals

Prompt engineering is the art and science of communicating effectively with LLMs. The quality of your output is directly proportional to the quality of your prompt.

The Golden Rule of Prompting

Be as specific and explicit as you would be when writing a detailed specification for a junior developer who has never seen your codebase.

Be Specific and Explicit

Vague prompts get vague answers. Specific prompts get precise, actionable output.

// BAD: Vague
"Write a function to process data"

// GOOD: Specific
"Write a TypeScript function called `parseUserCSV` that:
- Takes a CSV string as input
- Returns an array of User objects { name: string, email: string, role: 'admin' | 'user' }
- Skips the header row
- Throws a ValidationError if any email is invalid
- Uses no external libraries"

Provide Context and Constraints

Tell the model what it is working with — the tech stack, existing patterns, and boundaries.

// Context block example
"You are working on a Node.js 20 / TypeScript project using:
- NestJS for the API layer
- TypeORM for database access
- PostgreSQL as the database
- Jest for testing

The project follows Clean Architecture with these layers:
  domain/ → entities, value objects (no dependencies)
  application/ → use cases (depends on domain)
  infrastructure/ → DB, HTTP (depends on application)

Follow the existing patterns in the codebase."

Use Structured Output Formats

Tell the model exactly what format you want the response in.

// Request structured output
"Analyze this function for issues. Return your findings as:

## Findings
For each issue:
- **Severity**: S0/S1/S2/S3
- **Line**: line number
- **Issue**: description
- **Fix**: suggested code change

## Summary
- Total findings: N
- Blocks merge: yes/no"

Chain of Thought Prompting

Ask the model to think step by step. This dramatically improves reasoning accuracy.

// Chain of thought
"Before implementing, think through:
1. What are the edge cases?
2. What could go wrong?
3. What is the simplest correct solution?
4. How would you test this?

Then implement the solution."

// Even simpler:
"Think step by step before answering."

Why It Works

Chain of thought forces the model to show its reasoning, which activates more careful processing. Errors in intermediate steps become visible and correctable.

Few-Shot Examples

Show the model what good output looks like by providing 1-3 examples.

// Few-shot prompting
"Convert user stories to test cases.

Example input:
  'As a user, I can reset my password via email'
Example output:
  - test: should send reset email when valid email provided
  - test: should return 404 when email not found
  - test: should rate-limit to 3 reset requests per hour
  - test: should expire reset token after 24 hours

Now convert this story:
  'As a user, I can upload a profile avatar'"

System Prompts vs User Prompts

LLMs distinguish between two types of input:

Type	Purpose	Persistence	Example
System Prompt	Sets the model's identity, role, and constraints	Persists across the entire conversation	"You are a senior TypeScript developer. Follow SOLID principles. Never use any."
User Prompt	The specific task or question for this turn	This message only	"Implement the UserService class with CRUD methods."

AI Agents

An AI agent goes beyond simple prompt-response. It is an autonomous system that uses tools, makes decisions, and loops until a task is complete.

Key Definition

Agent = LLM + Tools + Memory + Loop

Without tools, an LLM can only generate text. With tools, it can read files, write code, run commands, search the web, and interact with any system — then decide what to do next based on results.

The Agent Loop

🧠

Think LLM reasons about the task

→

⚡

Act Use a tool: read, write, run, search

→

👁️

Observe See result of the tool

→

✅

Done? Yes: return result. No: loop back to Think.

Tool Use

The power of agents comes from their tools. Common tool categories:

📂 File operations — Read files, write code, search codebases
⚡ Command execution — Run tests, build projects, execute scripts
🌐 Web interaction — Search documentation, fetch API specs
🗄️ Database access — Query schemas, inspect data
🔀 Version control — Create branches, commit, open PRs

Agent Patterns

Pattern	How It Works	Best For
ReAct	Reason → Act → Observe → Repeat	General-purpose tasks, debugging, exploration
Plan-and-Execute	Create full plan first → Execute steps → Verify	Complex multi-step tasks, feature implementation
Reflection	Generate output → Critique own output → Improve → Repeat	Code review, writing, quality improvement

AI-Augmented Scrum

The most powerful application of AI agents in software development is augmenting the Scrum process itself. Every Scrum role can be assisted — or fully simulated — by a specialized AI agent.

AI as Product Owner (@po)

The AI Product Owner helps structure requirements and manage the backlog.

📋 Story Writing — Transforms vague ideas into well-structured user stories with INVEST criteria
✅ Acceptance Criteria — Generates Given/When/Then scenarios for every story
📊 Backlog Prioritization — Ranks stories by business value, dependencies, and risk
🎯 Sprint Goal — Synthesizes a clear sprint goal from selected stories

// Example: AI writes a user story
@po write story: "Users need to reset their password"

→ **US-042: Password Reset via Email**
  As a registered user
  I want to reset my password via email
  So that I can regain access when I forget my credentials

  **Acceptance Criteria:**
  - Given a valid email, system sends reset link within 30s
  - Given an invalid email, system shows generic message (no info leak)
  - Reset token expires after 24 hours
  - User cannot reuse last 5 passwords

  **Story Points:** 5 | **Priority:** High

AI as Scrum Master (@sm)

The AI Scrum Master facilitates ceremonies and tracks team health metrics.

📅 Sprint Planning — Guides story selection based on velocity and capacity
📈 Velocity Tracking — Monitors sprint progress and flags risks early
🔄 Retrospective Facilitation — Collects learnings, identifies patterns across sprints
🚧 Blocker Detection — Identifies impediments from daily standup reports

AI as Developer (@dev)

The AI Developer implements features using disciplined engineering practices.

🔴 TDD Cycle — Writes failing tests first, then implements the minimal code to pass
🏗️ Implementation — Follows SOLID principles and Clean Code practices
🔄 Refactoring — Improves code structure after tests are green
🔀 Git Operations — Creates branches, commits with meaningful messages, opens PRs

AI as QA (@qa)

The AI QA engineer designs test strategies and validates coverage.

🧪 Test Strategy — Designs test pyramid (unit/integration/E2E) for each feature
📊 Coverage Analysis — Identifies untested paths, edge cases, and boundary conditions
🐛 Bug Detection — Reviews code for potential runtime errors, race conditions, edge cases
✅ Acceptance Testing — Validates stories against acceptance criteria

AI as Architect (@arch) + Tech Lead (@lead)

@arch designs systems. @lead enforces code quality. They work at different levels.

🏛️ @arch — System Design — Defines components, boundaries, data flows, and communication patterns. Chooses architectural patterns (monolith, microservices, event-driven, clean architecture, etc.) based on project needs — never forced.
🔌 @arch — Infrastructure Decisions — Compute, storage, messaging, state management, resilience — technology-agnostic, adapts to web, desktop, mobile, embedded.
📋 @arch — ADRs — Records every significant structural decision with context, alternatives, and trade-offs.
📐 @lead — Code Quality — Reviews PRs against SOLID and Clean Code standards. Enforces at code level, not system level.
🤝 @lead — Unblocking — Makes quick tactical decisions, pairs with stuck devs, translates @arch's designs into implementation guidance.

Solo Mode vs Team Mode

The AI-augmented workflow supports two operating modes:

Solo Mode: AI Simulates the Entire Scrum Team

A single developer works with AI agents that play every role. The developer provides intent and validation; the AI handles planning, implementation, testing, and review.

👤

Human Provides intent + validates

📋

@po Plan stories

📅

@sm Sprint plan

💻

@dev Code (TDD)

🧪

@qa Test & verify

🔍

@lead Review (SOLID)

Team Mode: AI Augments Human Developers

Human team members use AI agents to accelerate their work. The AI handles repetitive tasks; humans focus on creativity, judgment, and collaboration.

👤 Human PO uses @po to draft stories, then refines and prioritizes
👤 Human developer uses @dev for TDD and implementation, reviews the code
👤 Human reviewer uses @lead to assist with code review, makes final call

Prompt Templates & Progressive Discovery

The workflow uses structured prompt templates to give each AI agent its role, context, and constraints. These prompts are not written ad-hoc — they are versioned, tested, and optimized.

Progressive Prompt Discovery

Prompts are loaded in layers, not all at once. This respects context window limits and keeps the AI focused.

🔌

Level 0: Boot Loader CLAUDE.md + .memory/MEMORY_INDEX.md (~50 lines). Always loaded.

🗂️

Level 1: Workflow Selection feature.md / bugfix.md / etc. Defines phases, gates, agents. Loaded based on task type.

📋

Level 2: Practice Protocols tdd.md / solid_review.md / etc. Specific techniques & checklists. Loaded when a phase needs them.

🤖

Level 3: Role Prompts @po / @dev / @qa / @lead / etc. Agent identity & instructions. Loaded when an agent is invoked.

The Role-Based Agent System

Each agent in the system has a specific identity, set of tools, and protocol it follows:

Agent	Role	Primary Tasks	Key Tools
`@po`	Product Owner	Stories, backlog, priorities, acceptance criteria	Memory read/write, backlog management
`@sm`	Scrum Master	Sprint planning, velocity, retrospectives	Sprint tracking, metrics
`@arch`	Architect	System structure, components, boundaries, patterns, infrastructure decisions, ADRs	Codebase search, diagram generation
`@lead`	Tech Lead	Code quality (SOLID/Clean Code), PR reviews, tactical decisions, unblocking devs	Code analysis, file reading
`@dev`	Developer	TDD implementation, refactoring, git operations	File write, command execution, git
`@qa`	QA Engineer	Test strategy, coverage analysis, acceptance testing	Test runner, code analysis
`@sec`	Security	Security audit, vulnerability scanning, hardening	SAST tools, dependency audit

Best Practices for AI-Driven Development

Do: Verify, Validate, Review

✅ Always verify AI output against actual code — LLMs can hallucinate function names, file paths, and APIs that don't exist
✅ Never skip human review for critical code — Security, financial logic, data handling must have human eyes
✅ Use AI for analysis, let humans make decisions — AI recommends, humans approve
✅ Manage context windows deliberately — Load only what's needed, summarize when possible
✅ Test AI-generated code the same as human code — TDD applies regardless of who wrote the code

Don't: Trust Blindly

🚫 Don't trust LLM output without verification — especially for file paths, API signatures, and edge cases
🚫 Don't dump entire codebases into context — selective loading beats brute force
🚫 Don't skip tests because "the AI wrote it correctly" — test everything
🚫 Don't use AI to generate code you don't understand — if you can't review it, don't ship it

Knowledge Check 1

What is an AI Agent?

Knowledge Check 2

In AI-augmented Scrum, who makes priority decisions?

← Lesson 12: Code Review Lesson 14: The Memory System →