Lesson 13

AI & LLM Engineering

Prompt engineering, AI agents, and AI-augmented Scrum for modern development

Large Language Models Prompt Engineering AI Agents Tool Use AI-Augmented Scrum Context Windows Role-Based Agents

What Are Large Language Models (LLMs)?

Large Language Models are neural networks trained on vast amounts of text data. They predict the next token (word fragment) in a sequence, but this simple objective gives rise to remarkable capabilities: reasoning, code generation, analysis, and creative problem solving.

Core Architecture: Transformers

Modern LLMs are built on the Transformer architecture (introduced in the 2017 paper "Attention Is All You Need"). The key innovation is the attention mechanism โ€” the model can look at all parts of the input simultaneously, understanding relationships between distant words.

Key Concepts

Concept What It Means Why It Matters
Token A word fragment (roughly 3/4 of a word) Models process and generate tokens, not words. "unbreakable" = ["un", "break", "able"]
Context Window Maximum number of tokens the model can see at once Determines how much code, documentation, or conversation the model can work with
Parameters The learned weights of the neural network More parameters generally means more capability (GPT-4 class: hundreds of billions)
Temperature Controls randomness of output (0 = deterministic, 1 = creative) Low for code generation, higher for brainstorming

How LLMs Are Built

๐Ÿ“š
Pre-training Train on massive text corpus. Learn language, facts, patterns. Billions of tokens.
๐ŸŽฏ
Fine-tuning Train on curated instruction data. Learn to follow instructions. Thousands of examples.
๐Ÿ‘ฅ
RLHF Human feedback ranks outputs. Learn to be helpful & safe. Thousands of comparisons.

Key LLM Capabilities

Prompt Engineering Fundamentals

Prompt engineering is the art and science of communicating effectively with LLMs. The quality of your output is directly proportional to the quality of your prompt.

The Golden Rule of Prompting

Be as specific and explicit as you would be when writing a detailed specification for a junior developer who has never seen your codebase.

Be Specific and Explicit

Vague prompts get vague answers. Specific prompts get precise, actionable output.

// BAD: Vague
"Write a function to process data"

// GOOD: Specific
"Write a TypeScript function called `parseUserCSV` that:
- Takes a CSV string as input
- Returns an array of User objects { name: string, email: string, role: 'admin' | 'user' }
- Skips the header row
- Throws a ValidationError if any email is invalid
- Uses no external libraries"

Provide Context and Constraints

Tell the model what it is working with โ€” the tech stack, existing patterns, and boundaries.

// Context block example
"You are working on a Node.js 20 / TypeScript project using:
- NestJS for the API layer
- TypeORM for database access
- PostgreSQL as the database
- Jest for testing

The project follows Clean Architecture with these layers:
  domain/ โ†’ entities, value objects (no dependencies)
  application/ โ†’ use cases (depends on domain)
  infrastructure/ โ†’ DB, HTTP (depends on application)

Follow the existing patterns in the codebase."

Use Structured Output Formats

Tell the model exactly what format you want the response in.

// Request structured output
"Analyze this function for issues. Return your findings as:

## Findings
For each issue:
- **Severity**: S0/S1/S2/S3
- **Line**: line number
- **Issue**: description
- **Fix**: suggested code change

## Summary
- Total findings: N
- Blocks merge: yes/no"

Chain of Thought Prompting

Ask the model to think step by step. This dramatically improves reasoning accuracy.

// Chain of thought
"Before implementing, think through:
1. What are the edge cases?
2. What could go wrong?
3. What is the simplest correct solution?
4. How would you test this?

Then implement the solution."

// Even simpler:
"Think step by step before answering."
Why It Works

Chain of thought forces the model to show its reasoning, which activates more careful processing. Errors in intermediate steps become visible and correctable.

Few-Shot Examples

Show the model what good output looks like by providing 1-3 examples.

// Few-shot prompting
"Convert user stories to test cases.

Example input:
  'As a user, I can reset my password via email'
Example output:
  - test: should send reset email when valid email provided
  - test: should return 404 when email not found
  - test: should rate-limit to 3 reset requests per hour
  - test: should expire reset token after 24 hours

Now convert this story:
  'As a user, I can upload a profile avatar'"

System Prompts vs User Prompts

LLMs distinguish between two types of input:

Type Purpose Persistence Example
System Prompt Sets the model's identity, role, and constraints Persists across the entire conversation "You are a senior TypeScript developer. Follow SOLID principles. Never use any."
User Prompt The specific task or question for this turn This message only "Implement the UserService class with CRUD methods."

AI Agents

An AI agent goes beyond simple prompt-response. It is an autonomous system that uses tools, makes decisions, and loops until a task is complete.

Key Definition

Agent = LLM + Tools + Memory + Loop

Without tools, an LLM can only generate text. With tools, it can read files, write code, run commands, search the web, and interact with any system โ€” then decide what to do next based on results.

The Agent Loop

๐Ÿง 
Think LLM reasons about the task
โ†’
โšก
Act Use a tool: read, write, run, search
โ†’
๐Ÿ‘๏ธ
Observe See result of the tool
โ†’
โœ…
Done? Yes: return result. No: loop back to Think.

Tool Use

The power of agents comes from their tools. Common tool categories:

Agent Patterns

Pattern How It Works Best For
ReAct Reason โ†’ Act โ†’ Observe โ†’ Repeat General-purpose tasks, debugging, exploration
Plan-and-Execute Create full plan first โ†’ Execute steps โ†’ Verify Complex multi-step tasks, feature implementation
Reflection Generate output โ†’ Critique own output โ†’ Improve โ†’ Repeat Code review, writing, quality improvement

AI-Augmented Scrum

The most powerful application of AI agents in software development is augmenting the Scrum process itself. Every Scrum role can be assisted โ€” or fully simulated โ€” by a specialized AI agent.

AI as Product Owner (@po)

The AI Product Owner helps structure requirements and manage the backlog.

  • ๐Ÿ“‹ Story Writing โ€” Transforms vague ideas into well-structured user stories with INVEST criteria
  • โœ… Acceptance Criteria โ€” Generates Given/When/Then scenarios for every story
  • ๐Ÿ“Š Backlog Prioritization โ€” Ranks stories by business value, dependencies, and risk
  • ๐ŸŽฏ Sprint Goal โ€” Synthesizes a clear sprint goal from selected stories
// Example: AI writes a user story
@po write story: "Users need to reset their password"

โ†’ **US-042: Password Reset via Email**
  As a registered user
  I want to reset my password via email
  So that I can regain access when I forget my credentials

  **Acceptance Criteria:**
  - Given a valid email, system sends reset link within 30s
  - Given an invalid email, system shows generic message (no info leak)
  - Reset token expires after 24 hours
  - User cannot reuse last 5 passwords

  **Story Points:** 5 | **Priority:** High

AI as Scrum Master (@sm)

The AI Scrum Master facilitates ceremonies and tracks team health metrics.

  • ๐Ÿ“… Sprint Planning โ€” Guides story selection based on velocity and capacity
  • ๐Ÿ“ˆ Velocity Tracking โ€” Monitors sprint progress and flags risks early
  • ๐Ÿ”„ Retrospective Facilitation โ€” Collects learnings, identifies patterns across sprints
  • ๐Ÿšง Blocker Detection โ€” Identifies impediments from daily standup reports

AI as Developer (@dev)

The AI Developer implements features using disciplined engineering practices.

  • ๐Ÿ”ด TDD Cycle โ€” Writes failing tests first, then implements the minimal code to pass
  • ๐Ÿ—๏ธ Implementation โ€” Follows SOLID principles and Clean Code practices
  • ๐Ÿ”„ Refactoring โ€” Improves code structure after tests are green
  • ๐Ÿ”€ Git Operations โ€” Creates branches, commits with meaningful messages, opens PRs

AI as QA (@qa)

The AI QA engineer designs test strategies and validates coverage.

  • ๐Ÿงช Test Strategy โ€” Designs test pyramid (unit/integration/E2E) for each feature
  • ๐Ÿ“Š Coverage Analysis โ€” Identifies untested paths, edge cases, and boundary conditions
  • ๐Ÿ› Bug Detection โ€” Reviews code for potential runtime errors, race conditions, edge cases
  • โœ… Acceptance Testing โ€” Validates stories against acceptance criteria

AI as Architect (@arch) + Tech Lead (@lead)

@arch designs systems. @lead enforces code quality. They work at different levels.

  • ๐Ÿ›๏ธ @arch โ€” System Design โ€” Defines components, boundaries, data flows, and communication patterns. Chooses architectural patterns (monolith, microservices, event-driven, clean architecture, etc.) based on project needs โ€” never forced.
  • ๐Ÿ”Œ @arch โ€” Infrastructure Decisions โ€” Compute, storage, messaging, state management, resilience โ€” technology-agnostic, adapts to web, desktop, mobile, embedded.
  • ๐Ÿ“‹ @arch โ€” ADRs โ€” Records every significant structural decision with context, alternatives, and trade-offs.
  • ๐Ÿ“ @lead โ€” Code Quality โ€” Reviews PRs against SOLID and Clean Code standards. Enforces at code level, not system level.
  • ๐Ÿค @lead โ€” Unblocking โ€” Makes quick tactical decisions, pairs with stuck devs, translates @arch's designs into implementation guidance.

Solo Mode vs Team Mode

The AI-augmented workflow supports two operating modes:

Solo Mode: AI Simulates the Entire Scrum Team

A single developer works with AI agents that play every role. The developer provides intent and validation; the AI handles planning, implementation, testing, and review.

๐Ÿ‘ค
Human Provides intent + validates
๐Ÿ“‹
@po Plan stories
๐Ÿ“…
@sm Sprint plan
๐Ÿ’ป
@dev Code (TDD)
๐Ÿงช
@qa Test & verify
๐Ÿ”
@lead Review (SOLID)

Team Mode: AI Augments Human Developers

Human team members use AI agents to accelerate their work. The AI handles repetitive tasks; humans focus on creativity, judgment, and collaboration.

Prompt Templates & Progressive Discovery

The workflow uses structured prompt templates to give each AI agent its role, context, and constraints. These prompts are not written ad-hoc โ€” they are versioned, tested, and optimized.

Progressive Prompt Discovery

Prompts are loaded in layers, not all at once. This respects context window limits and keeps the AI focused.

๐Ÿ”Œ
Level 0: Boot Loader CLAUDE.md + .memory/MEMORY_INDEX.md (~50 lines). Always loaded.
๐Ÿ—‚๏ธ
Level 1: Workflow Selection feature.md / bugfix.md / etc. Defines phases, gates, agents. Loaded based on task type.
๐Ÿ“‹
Level 2: Practice Protocols tdd.md / solid_review.md / etc. Specific techniques & checklists. Loaded when a phase needs them.
๐Ÿค–
Level 3: Role Prompts @po / @dev / @qa / @lead / etc. Agent identity & instructions. Loaded when an agent is invoked.

The Role-Based Agent System

Each agent in the system has a specific identity, set of tools, and protocol it follows:

Agent Role Primary Tasks Key Tools
@po Product Owner Stories, backlog, priorities, acceptance criteria Memory read/write, backlog management
@sm Scrum Master Sprint planning, velocity, retrospectives Sprint tracking, metrics
@arch Architect System structure, components, boundaries, patterns, infrastructure decisions, ADRs Codebase search, diagram generation
@lead Tech Lead Code quality (SOLID/Clean Code), PR reviews, tactical decisions, unblocking devs Code analysis, file reading
@dev Developer TDD implementation, refactoring, git operations File write, command execution, git
@qa QA Engineer Test strategy, coverage analysis, acceptance testing Test runner, code analysis
@sec Security Security audit, vulnerability scanning, hardening SAST tools, dependency audit

Best Practices for AI-Driven Development

Do: Verify, Validate, Review
Don't: Trust Blindly

Knowledge Check 1

What is an AI Agent?

Knowledge Check 2

In AI-augmented Scrum, who makes priority decisions?