Week 15 · Lecture outline

Week 15 — Lecture Outline · Linear Regression & Inference

Introduction to Statistics · MATH 11 Fall 2026 · Prof. Rivera Fictional sample

Course: Introduction to Statistics (MATH 11) · Silver Oak University (fictional sample) · Prof. Rivera
Objectives covered: Objective 8 — Fit and interpret a simple linear regression model, including inference for the slope.
SLOs touched: A (reason quantitatively from data) · B (communicate results to a non-technical audience)
Meeting pattern: 2 sessions × 75 min = 150 min. Segment minutes below total ~150; scale to your own pattern.

Week at a Glance


The week's big question	"Once two things move together, can we draw the line that turns the pattern into a prediction — and how do we know the line is real and not just noise?"
By the end of the week, students can…	(1) read a least-squares regression line ŷ = b₀ + b₁x and interpret its slope and intercept in context; (2) use the line to predict ŷ for a given x and compute a residual (observed − predicted); (3) interpret r² as the share of the variation in y the line explains; (4) explain inference for the slope — whether the slope is significantly different from 0 — by comparing a p-value to α, and read a residual plot.
Key vocabulary	response (y) vs. explanatory (x) variable, least-squares regression line, line of best fit, predicted value ŷ ("y-hat"), slope b₁, intercept b₀, prediction, extrapolation, residual (observed − predicted), residual plot, coefficient of determination r², variation explained, inference for the slope, null hypothesis (slope = 0), t-test for the slope, p-value, significance level α, statistically significant
Materials	slides (Deck 15), the week's readings + video links, a spreadsheet (Google Sheets or Excel) with `=SLOPE`, `=INTERCEPT`, `=RSQ`, one approved chatbot (Gemini / Claude / ChatGPT) for the AI-critique moment and the tutorial
Timing note	8 segments, ~150 min total. Session 1 = Segments 1–4 (~75). Session 2 = Segments 5–8 (~75).

Pre-computed throughout. Every number in this outline is worked out and friendly. The hero dataset's least-squares line is genuinely ŷ = 50 + 4x (the spreadsheet's =SLOPE/=INTERCEPT return exactly 4 and 50). All predictions, residuals, r², and the p-value-vs-α comparisons below are supplied — the materials hand you the regression output; the chatbot never fits a line from raw data on the fly.

Segment 1 — Hook & the Promise (8 min) · Session 1 opens

Hook. Put one line on the board: "For every extra hour a student studies, their exam score goes up about 4 points." Then ask: "If that's true, and a friend tells you they're going to study 7 hours for the final — what score would you bet on? And how sure are you?"
- Wait for guesses. Someone will do 7 × 4 = 28 and get stuck (28 of what?). Walk them to it: we also need a starting point — the score for someone who studied basically nothing.
- "That's the whole week. Back in Week 4 we learned to measure a relationship with the correlation r. This week we go one giant step further: we draw the actual line, use it to predict, and then — the grown-up move — we ask whether the line is real or whether we're fooling ourselves with noise."

The promise (write it on the board): "By the end of this week you can take any 'for every X, Y goes up by…' relationship and do four things: write its line, predict with it, measure how much it really explains, and decide whether the trend is statistically real."

Why it matters line (memory hook): "Correlation says they move together; regression draws the line that lets you bet on the next one — and inference tells you whether the line is worth betting on." (Callback to Week 4: correlation is a handshake, not a push — and a line still isn't a cause.)

Segment 2 — The Least-Squares Line: Slope & Intercept in Context (24 min)

Plain language first. When a scatterplot shows a linear pattern (Week 4: direction, form, strength), we can summarize it with one straight line — the least-squares regression line, the "line of best fit." We write it:

ŷ = b₀ + b₁x — read "y-hat equals b-naught plus b-one x."
- ŷ ("y-hat") is the predicted value of y the line gives for an x. The hat means predicted, not observed (callback to Week 1's p̂ — the hat means estimated, not the real thing).
- b₁ is the slope — how much ŷ changes for each one-unit increase in x.
- b₀ is the intercept — the predicted ŷ when x = 0.

Why "least-squares"? Of all the lines you could draw, this is the one that makes the total squared vertical distance from the dots to the line as small as possible. (Conceptual only — we read the line, we don't derive it.)

The hero dataset (we'll reuse it all week). Six students' weekly study hours (x) and exam score (y):

Student	A	B	C	D	E	F
Hours (x)	1	2	3	4	5	6
Score (y)	54	59	61	66	69	75

Run the fit (Segment 8 shows the spreadsheet) and you get — supplied, exact:

ŷ = 50 + 4x (slope b₁ = 4, intercept b₀ = 50).

Interpret the slope and intercept in context (this is the deliverable — say it in words, with units):
- Slope b₁ = 4: "For each additional hour studied, the model predicts the exam score rises by about 4 points." Units travel: it's 4 points per hour, not just "4."
- Intercept b₀ = 50: "A student who studied 0 hours is predicted to score about 50." Always check whether x = 0 is even sensible (here it's borderline — 0 hours is on the edge of the data; we'll flag this as the extrapolation trap in Segment 4).

Memory hook (put it on a slide):

Slope = "per-one-x" change in ŷ. Intercept = ŷ when x = 0. ŷ wears a hat because it's predicted, not measured.

Land the key idea: the line is a summary and a prediction machine, stated in the real-world units of the problem. A slope without units and context ("the slope is 4") is only half an answer.

Segment 3 — Predict ŷ, and the Residual (22 min)

Plain language first. Once you have the line, prediction is just arithmetic: plug an x into ŷ = b₀ + b₁x.

Worked prediction (every step out loud, supplied numbers):

A new student plans to study 7 hours. Predict their score.
ŷ = 50 + 4(7) = 50 + 28 = 78. "The model bets on about a 78."

Now the honest part — the residual. No real student lands exactly on the line. The residual measures the miss:

residual = observed y − predicted ŷ (always observed minus predicted).

Worked residual (supplied):

Student E studied 5 hours and actually scored 69.
Predicted: ŷ = 50 + 4(5) = 70. Residual = 69 − 70 = −1.
Read it: "E scored 1 point below what the line predicted." A negative residual means the dot sits below the line (the line over-predicted). A positive residual sits above the line.

A second one so the sign sticks:

Student F studied 6 hours and scored 75. Predicted ŷ = 50 + 4(6) = 74. Residual = 75 − 74 = +1 → F sits 1 point above the line.

Memory hook:

Residual = observed − predicted. Positive → above the line. Negative → below the line. Closer to 0 → the line nailed it.

Quick interaction (think-pair-share, ~6 min): give students the line ŷ = 50 + 4x and one point — "Student G studied 3 hours and scored 61." Ask for (a) the predicted score and (b) the residual. (Answer: ŷ = 50 + 4(3) = 62; residual = 61 − 62 = −1, so G is 1 point below the line.) Surface anyone who flips the subtraction (predicted − observed) — the convention is observed minus predicted, every time.

Segment 4 — Misconceptions + r² and Extrapolation (21 min) · Session 1 closes (~75)

First, r² — what the line explains. The coefficient of determination r² is the share of the variation in y that the regression line explains. It's just the correlation squared, and it lives between 0 and 1 (report it as a percent).

For the hero dataset, the spreadsheet's =RSQ returns r² ≈ 0.99, i.e. about 99% of the variation in exam scores is explained by study hours — an unusually tight fit (these are friendly demo numbers).
A more typical classroom example to teach the idea: if r = 0.9, then r² = 0.81 = 81% — "81% of the variation in y is explained by x, and the remaining 19% is due to everything else (other factors + scatter)."

Read r² in one honest sentence: "r² is the percent of the ups-and-downs in y that the line accounts for." Higher r² = points hug the line = predictions are tighter.

Name the misconceptions out loud, then cure each:

❌ "r² is the slope."
✅ Cure: completely different jobs. The slope is the per-unit change (4 points per hour, carries units); r² is a unitless share of variation between 0 and 1 (0.81 = 81%). A steep line can have a low r² (fuzzy cloud) and a gentle line a high r² (tight band). Slope = how much; r² = how well.
❌ "A high r² (or strong line) proves x causes y." — correlation ≠ causation, revisited.
✅ Cure: the line and r² only describe and predict; they never supply the causal arrow. Ask the Week-4 questions: plausible lurking variable? was anything randomly assigned? A regression on observational data is still a link, not a cause. (More study time may help — but motivation or prior prep could drive both.)
❌ "The line works for any x — just plug it in." — extrapolation.
✅ Cure: the line is trustworthy only inside the range of the data (here, ~1–6 hours). Predict for x = 40 hours and you get ŷ = 50 + 4(40) = 210 — a 210 on a 100-point exam, which is nonsense. Predicting outside the data is extrapolation, and it's where regression lies. Stay in the room the data was collected in.
❌ "The intercept is always meaningful."
✅ Cure: b₀ is only a real-world statement if x = 0 is sensible and near the data. "0 hours studied → predicted 50" is borderline; "predicted weight at height 0 inches" is meaningless. Interpret the intercept, but say when it's just a mathematical anchor.

Memory hook: "Slope = how much, r² = how well, and neither one is a cause. Never drive the line off the edge of the map (extrapolation)."

Segment 5 — Inference for the Slope: Is It Really There? (26 min) · Session 2 opens

Hook back in: "Last session we fit a line to six students and got a slope of 4. But here's the unsettling question: if I'd grabbed a different six students, I'd get a slightly different slope. So is the true slope really positive — or could the real relationship be flat (slope 0), and our 4 is just luck of the draw?"

Plain language first — the question inference answers. Our slope b₁ = 4 came from a sample. Behind it is a true population slope (call it β₁) we can't see. Inference for the slope asks: is the slope significantly different from 0? — i.e., is there real evidence of a linear relationship in the population, or could the truth be a flat line?

Set it up as a hypothesis test (callback to Weeks 13–14):
- Null hypothesis H₀: slope = 0 — no linear relationship (y doesn't change with x).
- Alternative Hₐ: slope ≠ 0 — there is a linear relationship.
- The tool is a t-test for the slope. The regression output hands you a t-statistic and a p-value; you compare the p-value to your significance level α (usually α = 0.05), exactly like every test in Week 13.

The decision rule (the only rule you need today):

p < α → reject H₀ → the slope IS significantly different from 0 (real evidence of a linear relationship).
p ≥ α → fail to reject H₀ → the slope is NOT significantly different from 0 (the trend could be noise).

Worked example (output supplied — we read it, we don't compute it):

The regression for our study-hours data reports slope b₁ = 4, t = 11.8, p = 0.001. Use α = 0.05.
Compare: p = 0.001 < 0.05 = α → reject H₀ → the slope is statistically significant. In words: "There is strong evidence (p = 0.001) that exam score really does change with study hours — the positive slope is not just noise."

The counter-case (so "significant" means something):

A different study regresses exam score on a student's height and reports slope b₁ = 0.6, p = 0.42, α = 0.05.
Compare: p = 0.42 > 0.05 → fail to reject H₀ → the slope is not significantly different from 0. In words: "We don't have evidence that score changes with height — the apparent slope is consistent with a flat line (pure chance)."

Memory hook (put it on a slide):

H₀: slope = 0 (flat line, no relationship). p < α → slope is real (reject). p ≥ α → could be noise (fail to reject). "Low p, slope's legit."

Land the key idea: a non-zero slope in your sample isn't enough — the p-value vs. α tells you whether to believe the relationship beyond your sample. Significant ≠ large, and significant ≠ causal; it only means "probably not zero."

Segment 6 — Residual Analysis: Reading the Residual Plot (18 min)

Plain language: before you trust a line, you check its residuals. A residual plot graphs each x against its residual (observed − predicted). It's a stethoscope for the fit.

What you want to see — and what's a red flag:
- ✅ Healthy: residuals scattered randomly above and below the zero line, no pattern, roughly even spread. Translation: a straight line was the right choice.
- ❌ Curve (a U or arch) in the residuals: the relationship was really curved — a straight line is the wrong model, even if r looked okay. (Callback to Week 4: r ≈ 0 can hide a U-shaped relationship; here a pattern in the residuals exposes a bad straight-line fit.)
- ❌ Fanning out (residuals grow as x grows): the spread of y isn't constant; predictions get shakier at one end.

Worked example (supplied residuals for the hero dataset, on ŷ = 50 + 4x):

Hours x	1	2	3	4	5	6
Predicted ŷ	54	58	62	66	70	74
Observed y	54	59	61	66	69	75
Residual	0	+1	−1	0	−1	+1

Plot those residuals (0, +1, −1, 0, −1, +1) against x: they bounce randomly around 0 with no shape — a clean residual plot. Verdict: the straight line is an appropriate fit. (Notice every residual is tiny, which is why r² was ~99%.)

Memory hook: "A residual plot should look like a boring, patternless cloud around zero. If you can see a shape in it, your model has the wrong shape."

Quick interaction (~4 min): show two residual plots on a slide — one a random band around 0, one an obvious U-shape. Ask which model is trustworthy. (The random band; the U-shape says "you fit a line to a curve.")

Segment 7 — Putting It Together: One Full Read of a Regression (20 min)

Plain language: real statistical software (or a spreadsheet's regression tool) hands you a little output table. The whole week is learning to read it. Here is the supplied output for the study-hours regression — read every box in plain English:

Regression: exam score on study hours (n = 6)
- Slope (b₁) = 4 → +4 points of score per extra hour studied.
- Intercept (b₀) = 50 → predicted score at 0 hours (borderline extrapolation — flag it).
- r² = 0.99 → ~99% of the variation in scores is explained by hours (very tight fit).
- t = 11.8, p = 0.001 → with α = 0.05, p < α, so the slope is statistically significant (the relationship is real, not noise).
- Residual plot → random scatter around 0, no pattern → a straight line is appropriate.

One-paragraph plain-language conclusion (model the SLO-B deliverable):

"In this sample, each extra hour of study is associated with about 4 more exam points, and study hours explain roughly 99% of the score differences. The relationship is statistically significant (p = 0.001), so it's very unlikely to be a fluke. Two cautions: this is observational data, so studying isn't proven to cause higher scores, and the line should only be used within about 1–6 hours of study — don't extrapolate to 40 hours."

Misconception sweep (rapid cures):
- ❌ "Significant means big / important." ✅ It means probably not zero — a tiny slope can be significant with enough data; a large slope can be non-significant with little.
- ❌ "r² near 1 means the line is true / causal." ✅ r² is about tightness of fit, not truth or cause. A spurious correlation can have a high r².
- ❌ "A small p-value proves causation." ✅ p-values test slope ≠ 0, never x causes y. Causation still needs an experiment or a ruled-out lurking variable.

Quick mini-debate (genuinely arguable, ~4 min): "A study of 200 cities finds that for every extra coffee shop per capita, the median home price rises by \$3,000, with p < 0.001 and r² = 0.55. A realtor says, 'Build coffee shops to raise home values.'" Argue both sides; surface that the slope is significant and explains real variation, yet it's observational (lurking variable: neighborhood wealth/density drives both) — so no causal advice, and don't extrapolate to a town with no data.

Segment 8 — Technology Workflow + AI-Critique, Callback & Hand-off (12 min) · Session 2 closes (~75)

Technology workflow — fit and read a regression in a spreadsheet (exact steps):
1. Put the two variables in two columns — study hours in A2:A7, exam score in B2:B7.
2. Get the slope: in an empty cell type =SLOPE(B2:B7, A2:A7) → returns 4. (Order matters: =SLOPE(known_ys, known_xs).)
3. Get the intercept: =INTERCEPT(B2:B7, A2:A7) → returns 50. So the line is ŷ = 50 + 4x.
4. Get r²: =RSQ(B2:B7, A2:A7) → returns about 0.99. (Google Sheets and Excel are identical for all three.)
5. Predict a new x in a cell: =50 + 4*7 → 78. Residual for an observed point: =observed - (50 + 4*x), e.g. =69-(50+4*5) → −1.
6. For the t and p-value of the slope, use the full regression tool (Excel: Data ▸ Data Analysis ▸ Regression; Sheets: an add-on or =LINEST for the advanced version) — but in this course you read the supplied t and p; you don't compute them by hand.

AI-critique moment (students verify, not consume):

Paste this to an approved chatbot, with the line and output supplied: "A regression gives ŷ = 50 + 4x for exam score on study hours, fit on students who studied 1–6 hours, with p = 0.001 and r² = 0.99. A student asks: 'So if I study 40 hours I'll score 210, right? And this proves studying causes higher grades?' Is that reasoning correct?"
Then check its answer against today's lesson. A careless model may happily plug in 40 and report 210, or call the relationship causal. The honest answer flags two errors: (1) extrapolation — x = 40 is far outside the 1–6 hour data, so ŷ = 210 is meaningless (and impossible on a 100-point exam); (2) correlation ≠ causation — a significant slope and high r² describe and predict, but this is observational, so studying isn't proven to cause the scores. Your job all semester: the tool drafts, you judge.

Callback + tease:
- Callback: "Week 4 we measured a relationship with r and warned that correlation isn't cause. This week we drew the line (slope, intercept), predicted with it (ŷ, residuals), said how much it explains (r²), and tested whether it's real (p vs. α) — and the causation warning still stands."
- Tease next week: "Week 16 is the final — cumulative across all eight objectives. We'll pull the whole arc together: from where data comes from (Week 1) to drawing the line and testing it (this week). Your study guide, exam-prep tutorial, and practice exam are in the Week 16 module."

Hand-off (the week's graded work):
- Lecture Tutorial 15 (AI tutor, share-link submission) — the least-squares line, slope/intercept in context, prediction & residuals, r², and inference for the slope.
- Quiz 15, Discussion 15 ("Read the headline's slope" — find a real "for every X, Y rises by…" claim and reason about the slope, r², and the extrapolation/causation pitfalls), and Assignment 15 (four worked problems). This is the last regular week — the final is next week.

Instructor FAQ — Common Stumbles

Student says / does	Quick cure
"The slope is 4." (no units, no context)	Half an answer. Make them say it in the problem's words: "4 points of exam score per additional hour studied." The slope is always a per-one-unit-of-x change in y, with units.
Computes the residual as predicted − observed.	The convention is observed − predicted, every time. Re-do it: observed 69, predicted 70 → 69 − 70 = −1 (the dot is below the line). Sign carries meaning.
"r² is the slope" / reports r² with units.	Two different things. Slope = per-unit change (carries units); r² = unitless share of variation, 0 to 1 (0.81 = 81% explained). Steep ≠ high r².
Plugs a far-out x into the line (e.g., 40 hours → 210).	Extrapolation. The line is only trustworthy inside the data's x-range (~1–6 hrs). Outside it, the prediction is unsupported — and here, impossible (210 on a 100-pt exam).
"A significant slope means studying causes higher scores."	Inference tests slope ≠ 0, not causation. Observational data → a link, not a cause (Week 4 returns). Hunt the lurking variable; check for random assignment.
Confuses "significant" with "large."	Significance = probably not zero (p < α). A tiny slope can be significant with lots of data; a big slope can be non-significant with little. Different questions.
"p = 0.001 is the probability the slope is 0."	Keep it operational at this level: p < α → reject H₀ (slope ≠ 0); p ≥ α → fail to reject. Don't over-philosophize the p-value; compare it to α and decide.
Reads a U-shaped residual plot as "fine, points are near zero."	A pattern in the residual plot (a curve) means a straight line is the wrong model, regardless of how small the residuals are. You want a patternless cloud around 0.
Interprets the intercept when x = 0 is nonsense.	Interpret b₀ only when x = 0 is sensible and near the data. Otherwise call it a mathematical anchor, not a real-world prediction.

Scope flag

This outline stays within Objective 8 — simple linear regression and inference for the slope. We read regression output (slope, intercept, r², t, p) and interpret it; we do not derive the least-squares formulas, compute r by hand, or calculate the t-statistic's standard error — those are supplied. Multiple regression is out of scope (spine, Objective 8). The full output-table read in Segment 7 is added synthesis (it ties the week together for the final); cut it for a leaner 60-minute version. Keep the extrapolation-to-40-hours and the height-vs-score non-significant cases — they make "significant," "extrapolation," and "not a cause" stick.

~ Prof. Rivera's edition · Fall 2026 · built with thecoursemaker.com