PROJECT 03

Real-Time Dashboard

Investigate first, then build β€” the spike-to-sprint pattern

⏱ ~60 min · 1 Spike + 1 Sprint · Intermediate

What This Project Teaches

Projects 01 and 02 started building immediately β€” the requirements were clear. This project starts with uncertainty: "How should we push real-time data to the browser?" You'll learn when to investigate first using /agile-explore-brainstorm and /agile-explore-spike before writing any production code.

⚠️ What's New vs Projects 01-02

Projects 01-02 started building immediately. This project starts with uncertainty: "How should we push real-time data?" You'll use /agile-explore-brainstorm and /agile-explore-spike to investigate BEFORE writing production code. You'll also learn to save and recall decisions with /agile-memory-remember and /agile-memory-recall, and test performance with /agile-code-perf.

Roles You'll See in Action

RoleWhat they do in this projectWhen
πŸ—οΈ @archEvaluates SSE vs WebSockets vs Polling, scores approaches, recommends architectureBrainstorm, spike design
⌨️ @devBuilds spike prototype, implements SSE endpoint and dashboard UISpike, implementation
πŸ‘¨β€πŸ’» @leadReviews code quality, approves PRs, plans implementationCode review, story planning
πŸ“‹ @poStructures requirements into stories, accepts completed workStory creation, acceptance
πŸ”„ @smFacilitates spike process, sprint ceremonies, tracks progressSpike framing, sprint ceremonies
πŸš€ @devopsRuns CI pipeline, handles deployment, performance infrastructureCI/CD, deploy
πŸ§ͺ @qaVerifies Definition of Done, checks test coverage and performanceDoD verification

Phase 1: Project Setup

Same pattern as Projects 01-02 β€” describe what you want, brainstorm the stack, initialize the framework.

Step 1: Tell Claude what you want to build

πŸ’¬ You say

"I want to build a live dashboard that shows server metrics β€” CPU usage, memory usage, uptime, and request count β€” updating in real-time. I'm working solo."

Step 2: Brainstorm the stack β€” /agile-explore-brainstorm

πŸ’¬ You type

/agile-explore-brainstorm I need a server that collects system metrics and a web dashboard that displays them live. Lightweight, easy to set up, good for real-time streaming.

πŸ—οΈ @arch evaluates approaches for the base stack (not the real-time transport β€” that comes in Phase 2):

πŸ—οΈ @arch responds:

## /agile-explore-brainstorm β€” Results

### Recommendation
Node.js + Express for the server, vanilla HTML/JS for the dashboard.
Lightweight, familiar, excellent streaming support.

### Detected Stack
- Language: JavaScript (Node.js 20)
- Framework: Express 4.x
- Frontend: Vanilla HTML + CSS + JS (no build step)
- Test Framework: Jest + Supertest
βœ… You decide

"Node.js + Express with a vanilla frontend. Keep it simple."

Step 3: Initialize β€” /agile-setup-init + /agile-setup-onboard + /agile-setup-health

/agile-setup-init       β†’ Memory populated (project, architecture, conventions)
/agile-setup-onboard    β†’ Coding rules generated from detected conventions
/agile-setup-health     β†’ βœ… All checks pass

Phase 1 complete. But unlike Projects 01-02, we're NOT ready to create stories yet. We have a fundamental question: how do we push data from the server to the browser in real-time?

Phase 2: Technical Spike β€” How to Push Real-Time Data? (~15 min)

🚫 This is the KEY teaching phase

In Projects 01-02, you knew how to build things β€” the only question was what. Here, you don't even know the right approach. Should you use WebSockets? SSE? Polling? When you're uncertain, investigate first. That's what spikes are for.

Step 4: Define the question

πŸ’¬ You say

"I need to push data from the server to the browser in real-time. I'm not sure whether to use WebSockets, SSE, or polling. Can we investigate?"

πŸ”„ @sm responds:

This is a spike β€” a time-boxed investigation that produces
learnings, not production code. Let's structure the investigation:

1. Brainstorm approaches β†’ /agile-explore-brainstorm
2. Prototype the winner β†’ /agile-explore-spike
3. Record the decision β†’ /agile-memory-remember

Time-box: ~15 minutes for the full spike.

Step 5: Deep architecture comparison β€” /agile-explore-brainstorm

πŸ’¬ You type

/agile-explore-brainstorm I need to push real-time metrics from server to browser. Options: WebSockets, Server-Sent Events (SSE), long polling. Dashboard is read-only β€” server pushes, browser displays. Need auto-reconnect on network drops.

πŸ—οΈ @arch performs a deep comparison of all three approaches:

πŸ—οΈ @arch responds with a scoring matrix:

## /agile-explore-brainstorm β€” Results

### Problem Statement
Push real-time server metrics to a read-only browser dashboard.
Constraints: unidirectional (server→client), auto-reconnect,
simple to implement.

### Approach 1: Server-Sent Events (SSE)
- Mechanism: HTTP connection held open, server sends text/event-stream
- Pros: Native browser API (EventSource), auto-reconnect built-in,
  works through proxies, simple server implementation
- Cons: Unidirectional only (server→client), text-based (no binary),
  limited to ~6 connections per domain in HTTP/1.1
- Risks: Low β€” mature, well-supported standard
- Effort: Small β€” ~50 lines of server code

### Approach 2: WebSockets
- Mechanism: Full-duplex TCP connection upgraded from HTTP
- Pros: Bidirectional, binary support, low latency
- Cons: No auto-reconnect (must implement manually), more complex
  server setup, doesn't work through some proxies, requires ws library
- Risks: Medium β€” reconnection logic adds complexity
- Effort: Medium β€” ~120 lines + reconnection handling

### Approach 3: Long Polling
- Mechanism: Client repeatedly requests, server holds response
  until data available
- Pros: Works everywhere, simple to understand
- Cons: Higher latency, more HTTP overhead, no true streaming,
  complex timeout management
- Risks: Medium β€” race conditions with rapid polls
- Effort: Medium β€” ~100 lines + timeout management

### Scoring Matrix

| Approach     | Feasibility | Maintainability | Performance | Risk | Total |
|--------------|-------------|-----------------|-------------|------|-------|
| SSE          | 9/10        | 9/10            | 8/10        | 9/10 | 35    |
| WebSockets   | 8/10        | 6/10            | 9/10        | 7/10 | 30    |
| Long Polling  | 7/10        | 5/10            | 5/10        | 7/10 | 24    |

### Recommendation
SSE wins because:
- Unidirectional fits perfectly (server→client only)
- Native browser support via EventSource API
- Auto-reconnect is built-in β€” zero extra code
- Simpler than WebSockets for read-only use cases

### What Would Change This
- If you need bidirectional communication β†’ WebSockets
- If you need binary data streaming β†’ WebSockets
- If browser support for very old browsers matters β†’ Long Polling
βœ… You decide

"SSE makes sense β€” the dashboard only receives data, never sends. Auto-reconnect for free is a big win."

Step 6: Prototype to verify β€” /agile-explore-spike

The brainstorm gave us a recommendation. But recommendations aren't proof. Let's build a throwaway prototype to verify SSE works for our specific use case.

πŸ’¬ You type

/agile-explore-spike Build a throwaway prototype to verify SSE works for our use case. Time-box: 2 hours. Test: can we push 10 metrics/second to the browser with auto-reconnect?

⌨️ @dev builds a minimal throwaway prototype β€” just enough to answer the question:

⌨️ @dev builds spike prototype:

## /agile-explore-spike β€” Prototype Results

### What Was Built (throwaway)
- Minimal Express server with SSE endpoint
- Sends random metrics every 100ms (10/second)
- Simple HTML page with EventSource connection

### Test Results

| Test                    | Result | Notes                        |
|-------------------------|--------|------------------------------|
| 10 metrics/second       | βœ… Pass | Stable at 10/s for 5 minutes |
| Auto-reconnect          | βœ… Pass | Reconnects in ~3 seconds     |
| Browser compatibility   | βœ… Pass | Chrome, Firefox, Safari, Edge|
| Memory usage (server)   | βœ… Pass | Stable at ~25MB RSS          |
| Connection drop recovery| βœ… Pass | EventSource handles natively |

### Latency
- Average: 2ms from server emit to browser render
- P99: 8ms
- Acceptable for a metrics dashboard

### Verdict
SSE is confirmed suitable. No blockers found.
⚠️ Spike Code Is THROWAWAY

Spike prototypes are throwaway. Never merge spike code to develop. The spike answers a question β€” it doesn't produce production code. You'll rebuild properly with TDD in the sprint.

Step 7: Record the decision β€” /agile-memory-remember

Now that you have evidence, record the decision so you (and the framework) can recall why you chose SSE months from now.

πŸ’¬ You type

/agile-memory-remember We chose SSE over WebSockets for the dashboard because the data flow is unidirectional (server→browser). SSE provides native auto-reconnect and is simpler. Revisit if we need bidirectional communication.

The decision is saved as an Architecture Decision Record (ADR):

πŸ”„ @sm: Decision recorded as ADR

## ADR-001: Use SSE for Real-Time Data Transport

### Status
Accepted

### Context
Dashboard needs to push server metrics to the browser in real-time.
Data flow is unidirectional (server→client only). Need auto-reconnect.

### Decision
Use Server-Sent Events (SSE) over WebSockets or Long Polling.

### Rationale
- SSE scored 35/40 vs WebSockets 30/40 vs Polling 24/40
- Unidirectional fits our use case perfectly
- Native auto-reconnect via EventSource API
- Spike prototype confirmed: 10 metrics/sec, 2ms latency, stable

### Consequences
- Cannot send data from browser to server over this channel
- Limited to text-based data (JSON is fine for metrics)

### Revisit If
- Need bidirectional communication
- Need binary data streaming

β†’ Saved to: .memory/episodic/decisions.md
βœ… Spike complete!

You now have: a clear recommendation (brainstorm), proof it works (spike prototype), and a recorded decision (ADR). You can build with confidence. This is the spike-to-sprint pattern.

Phase 3: Sprint 1 β€” Build the Dashboard (~11 pts)

Step 8: Create user stories

πŸ’¬ You type

/agile-story-create I need an SSE endpoint that streams server metrics (CPU, memory, uptime, request count) every 2 seconds

πŸ“‹ @po responds:

## /agile-story-create β€” Story Created

### Story US-001
**As a** dashboard consumer,
**I want** an SSE endpoint that streams server metrics every 2 seconds,
**so that** I can receive live CPU, memory, uptime, and request count data.

### Acceptance Criteria

**AC-1: Happy path β€” SSE stream delivers metrics**
- **Given** the server is running
- **When** I connect to GET /metrics/stream with Accept: text/event-stream
- **Then** I receive a new event every 2 seconds containing
  {cpu, memory, uptime, requestCount} as JSON

**AC-2: Correct SSE format**
- **Given** I am connected to the stream
- **When** an event arrives
- **Then** it uses proper SSE format (data: prefix, double newline delimiter)

**AC-3: Connection cleanup**
- **Given** a client is connected to the stream
- **When** the client disconnects
- **Then** the server cleans up the connection (no memory leak)

### Metadata
- **Points**: 3
- **Status**: Unrefined

> Added to: .memory/backlog/product.md
πŸ’¬ You type

/agile-story-create I need a web page that displays the metrics as live-updating cards β€” one card per metric, auto-reconnects if connection drops

πŸ“‹ @po responds:

## /agile-story-create β€” Story Created

### Story US-002
**As a** dashboard user,
**I want** a web page with live-updating metric cards,
**so that** I can monitor server health at a glance.

### Acceptance Criteria

**AC-1: Dashboard displays four metric cards**
- **Given** I open the dashboard page
- **When** the SSE connection is established
- **Then** I see four cards: CPU, Memory, Uptime, Request Count,
  each updating every 2 seconds

**AC-2: Auto-reconnect on connection drop**
- **Given** the SSE connection drops
- **When** the browser detects the disconnect
- **Then** it automatically reconnects (via EventSource native behavior)
  and resumes displaying live data

**AC-3: Visual feedback during connection**
- **Given** I open the dashboard
- **When** connection is active β†’ show green "Connected" indicator
- **When** connection is lost β†’ show yellow "Reconnecting..." indicator

### Metadata
- **Points**: 5
- **Status**: Unrefined

> Added to: .memory/backlog/product.md
πŸ’¬ You type

/agile-story-create The dashboard should handle connection errors gracefully β€” show a reconnecting indicator and recover automatically

πŸ“‹ @po responds:

## /agile-story-create β€” Story Created

### Story US-003
**As a** dashboard user,
**I want** graceful error handling and recovery,
**so that** temporary network issues don't break my monitoring experience.

### Acceptance Criteria

**AC-1: Reconnection indicator**
- **Given** the SSE connection is lost
- **When** the browser is attempting to reconnect
- **Then** a "Reconnecting..." banner is displayed

**AC-2: Successful recovery**
- **Given** the connection was lost and is now restored
- **When** the SSE stream resumes
- **Then** the banner disappears and metrics resume updating

**AC-3: Maximum retry with fallback**
- **Given** the connection has failed repeatedly
- **When** reconnection attempts exceed a threshold
- **Then** show an error message with a manual "Retry" button

### Metadata
- **Points**: 3
- **Status**: Unrefined

> Added to: .memory/backlog/product.md

Step 9: Sprint planning β€” /agile-sprint-planning

πŸ’¬ You type

/agile-sprint-planning

πŸ”„ @sm facilitates:

## Sprint Planning Output

### Sprint Goal
"Live dashboard with auto-reconnecting SSE stream"

### Sprint Backlog
| Story  | Title                       | Points | Priority |
|--------|-----------------------------|--------|----------|
| US-001 | SSE metrics endpoint        | 3      | P1       |
| US-002 | Live dashboard UI           | 5      | P1       |
| US-003 | Error handling and recovery | 3      | P1       |

### Capacity Check
- Committed this sprint: 11 points
- Stories depend on each other: US-001 β†’ US-002 β†’ US-003

### Risks
- SSE approach already validated by spike β€” low technical risk

Step 10: Build US-001 β€” SSE Metrics Endpoint (3 pts)

/agile-code-branch feature US-001 sse-metrics-endpoint
/agile-story-plan US-001

πŸ‘¨β€πŸ’» @lead: Implementation Plan for US-001

Phase 1: Metrics collector
- Module to read CPU, memory, uptime, request count
- Returns a plain object with current values

Phase 2: SSE endpoint
- GET /metrics/stream
- Set headers: Content-Type text/event-stream, Cache-Control no-cache
- Send metrics every 2 seconds using setInterval
- Clean up interval on client disconnect

Phase 3: Connection management
- Track active connections
- Remove on close event

Confidence: 9/10 β€” spike already proved this works
πŸ’¬ You type

/agile-code-tdd US-001

πŸ”΄ RED β€” Write failing test for AC-1

// test/metrics-stream.test.js
test('GET /metrics/stream returns SSE with metrics', (done) => {
  const res = request(app)
    .get('/metrics/stream')
    .set('Accept', 'text/event-stream');

  const chunks = [];
  res.on('data', (chunk) => {
    chunks.push(chunk.toString());
    if (chunks.length >= 2) {
      const data = JSON.parse(chunks[1].replace('data: ', ''));
      expect(data).toHaveProperty('cpu');
      expect(data).toHaveProperty('memory');
      expect(data).toHaveProperty('uptime');
      expect(data).toHaveProperty('requestCount');
      res.abort();
      done();
    }
  });
});

// Result: ❌ FAIL β€” endpoint doesn't exist

🟒 GREEN β€” Implement SSE endpoint

// src/routes/metrics.js
router.get('/stream', (req, res) => {
  res.writeHead(200, {
    'Content-Type': 'text/event-stream',
    'Cache-Control': 'no-cache',
    'Connection': 'keep-alive'
  });

  const interval = setInterval(() => {
    const metrics = {
      cpu: os.loadavg()[0],
      memory: process.memoryUsage().heapUsed,
      uptime: process.uptime(),
      requestCount: getRequestCount()
    };
    res.write(`data: ${JSON.stringify(metrics)}\n\n`);
  }, 2000);

  req.on('close', () => clearInterval(interval));
});

// Result: βœ… PASS

πŸ”΅ REFACTOR β€” Extract metrics collection

// Extract to src/services/metricsCollector.js
// Tests still pass βœ…
/agile-code-ci           β†’ βœ… All green (4/4 tests passing)
/agile-code-commit       β†’ feat(metrics): add SSE metrics streaming endpoint
/agile-code-pr           β†’ PR created
/agile-code-pr-review    β†’ βœ… Approved (🟑 S2: extract interval timing to config)
/agile-code-merge        β†’ Squash merged to develop
/agile-story-dod         β†’ βœ… DONE
/agile-story-accept      β†’ βœ… ACCEPTED (3 points)

Step 11: Build US-002 β€” Live Dashboard UI (5 pts)

/agile-code-branch feature US-002 dashboard-ui
/agile-story-plan US-002
/agile-code-tdd US-002
  πŸ”΄ Test: Dashboard page loads with 4 metric cards β†’ ❌
  🟒 Create HTML with card layout + EventSource connection β†’ βœ…
  πŸ”΄ Test: Cards update when SSE event arrives β†’ ❌
  🟒 Parse event data, update DOM elements β†’ βœ…
  πŸ”΄ Test: Connection status indicator shows state β†’ ❌
  🟒 Add connected/reconnecting indicators β†’ βœ…
/agile-code-ci           β†’ βœ… All green
/agile-code-commit       β†’ feat(dashboard): add live-updating metrics UI
/agile-code-pr           β†’ PR created
/agile-code-pr-review    β†’ βœ… Approved
/agile-code-merge        β†’ Squash merged to develop
/agile-story-dod         β†’ βœ… DONE
/agile-story-accept      β†’ βœ… ACCEPTED (5 points)

Step 12: Build US-003 β€” Error Handling (3 pts) + Performance Testing

/agile-code-branch feature US-003 error-handling
/agile-story-plan US-003
/agile-code-tdd US-003
  πŸ”΄ Test: Reconnecting banner shows on disconnect β†’ ❌
  🟒 Add EventSource onerror handler with banner toggle β†’ βœ…
  πŸ”΄ Test: Banner disappears on successful reconnect β†’ ❌
  🟒 Add EventSource onopen handler to hide banner β†’ βœ…
  πŸ”΄ Test: Manual retry button after max retries β†’ ❌
  🟒 Add retry counter with fallback UI β†’ βœ…

Performance testing β€” /agile-code-perf (this is new!)

Before merging, let's test how the SSE endpoint handles load. This command is new β€” it profiles your code under realistic conditions.

πŸ’¬ You type

/agile-code-perf Test the SSE endpoint under load: how many concurrent connections can it handle? What's the latency at 100 connections?

⌨️ @dev profiles the SSE endpoint:

## /agile-code-perf β€” Results

### Test Configuration
- Endpoint: GET /metrics/stream
- Concurrent connections: 10, 50, 100, 200
- Duration: 30 seconds per level
- Metric interval: 2 seconds

### Results

| Connections | Avg Latency | P99 Latency | Memory (RSS) | Status |
|-------------|-------------|-------------|--------------|--------|
| 10          | 1ms         | 3ms         | 28MB         | βœ…     |
| 50          | 2ms         | 6ms         | 35MB         | βœ…     |
| 100         | 4ms         | 12ms        | 48MB         | βœ…     |
| 200         | 8ms         | 25ms        | 72MB         | ⚠️     |

### Findings
- βœ… Stable up to 100 concurrent connections
- ⚠️ At 200 connections, memory grows linearly β€” each connection
  holds an open response + interval timer
- Recommendation: for production, add connection pooling or
  switch to a broadcast pattern (single interval, fan out to all)

### Optimization Applied
Refactored to single metrics interval with broadcast to all
connected clients. Re-test:

| Connections | Avg Latency | Memory (RSS) | Status |
|-------------|-------------|--------------|--------|
| 200         | 3ms         | 38MB         | βœ…     |

βœ… Optimization resolved the scaling concern.
/agile-code-ci           β†’ βœ… All green
/agile-code-commit       β†’ feat(dashboard): add error handling with auto-recovery
/agile-code-pr           β†’ PR created
/agile-code-pr-review    β†’ βœ… Approved
/agile-code-merge        β†’ Squash merged to develop
/agile-story-dod         β†’ βœ… DONE
/agile-story-accept      β†’ βœ… ACCEPTED (3 points)

Step 13: Recalling the decision β€” /agile-memory-recall (this is new!)

During implementation, you might forget why you chose SSE. The framework remembers for you.

πŸ’¬ You type

/agile-memory-recall Why did we choose SSE?

πŸ”„ @sm: Recalled from .memory/episodic/decisions.md

## ADR-001: Use SSE for Real-Time Data Transport

### Decision
Use Server-Sent Events (SSE) over WebSockets or Long Polling.

### Key Rationale
- Scored 35/40 vs WebSockets 30/40
- Unidirectional fits our read-only dashboard
- Native auto-reconnect via EventSource
- Spike confirmed: 10 metrics/sec, 2ms latency

### Revisit If
- Need bidirectional communication
- Need binary data streaming

This is especially valuable on long-running projects where decisions made weeks ago need to be revisited. The ADR tells you not just what you chose, but why β€” and when to reconsider.

Step 14: Sprint ceremonies (abbreviated)

πŸ’¬ You type

/agile-sprint-review

## Sprint Review Report

### Sprint Goal: "Live dashboard with auto-reconnecting SSE stream"
### Sprint Goal Met: βœ… Yes

| Story  | Title                       | Points | Status      |
|--------|-----------------------------|--------|-------------|
| US-001 | SSE metrics endpoint        | 3      | βœ… Accepted  |
| US-002 | Live dashboard UI           | 5      | βœ… Accepted  |
| US-003 | Error handling and recovery | 3      | βœ… Accepted  |

### Velocity
- Committed: 11 points
- Completed: 11 points (100%)
/agile-sprint-retro       β†’ What went well: spike de-risked the approach,
                             performance testing caught broadcast optimization
/agile-memory-learn       β†’ Saved sprint velocity + spike effectiveness

Phase 4: Release

/agile-ship-changelog     β†’ v1.0.0: SSE endpoint, live dashboard, error handling
/agile-ship-release       β†’ release/v1.0.0 β†’ main, tagged, back-merged
/agile-ship-deploy        β†’ βœ… Deployed, smoke tests passing, dashboard live

What You Built

MetricValue
Stories completed3/3
Story points delivered11/11
Sprint goal metβœ… Yes
Release versionv1.0.0
Spike conducted1 (SSE vs WebSockets vs Polling)
ADRs recorded1 (ADR-001)
Performance optimizedBroadcast pattern (200 connections, 3ms latency)
Roles involved@arch, @dev, @lead, @po, @sm, @devops, @qa

The Spike-to-Sprint Pattern

This project taught a fundamentally different workflow from Projects 01-02:

❓
UncertaintyI don't know the approach
🧠
Brainstorm/agile-explore-brainstorm
πŸ§ͺ
Spike/agile-explore-spike
πŸ“
Record/agile-memory-remember
βœ…
ConfidenceNow I know why
πŸƒ
SprintBuild with evidence
πŸ”
Recall/agile-memory-recall
πŸš€
ShipDeploy with confidence

New Commands vs Projects 01-02

CommandWhat it doesWhen to use
βœ… /agile-explore-spikeTime-boxed throwaway prototype to test a hypothesisWhen you need proof that an approach works
βœ… /agile-code-perfPerformance testing under realistic loadBefore merging code that handles concurrent users
βœ… /agile-memory-rememberSave a convention or decision as an ADRAfter making an architectural decision with evidence
βœ… /agile-memory-recallRetrieve saved context and decisionsWhen you need to remember why a past decision was made
βœ… Key Takeaway

When you're uncertain, investigate first. A 2-hour spike saves weeks of rework. The framework's explore commands turn "I don't know" into "Here's why we chose X, with evidence." The ADR pattern ensures you never lose the why behind a decision β€” even months later.

The spike-to-sprint pattern: uncertainty β†’ brainstorm β†’ spike β†’ record β†’ build with confidence.

🧠 Knowledge Check

When should you use /agile-explore-spike?

🧠 Knowledge Check

What happens to spike prototype code?