Week 15 — Lecture Outline · Linear Regression & Inference
Course: Introduction to Statistics (MATH 11) · Silver Oak University (fictional sample) · Prof. Rivera
Objectives covered: Objective 8 — Fit and interpret a simple linear regression model, including inference for the slope.
SLOs touched: A (reason quantitatively from data) · B (communicate results to a non-technical audience)
Meeting pattern: 2 sessions × 75 min = 150 min. Segment minutes below total ~150; scale to your own pattern.
Week at a Glance
| The week's big question | "Once two things move together, can we draw the line that turns the pattern into a prediction — and how do we know the line is real and not just noise?" |
| By the end of the week, students can… | (1) read a least-squares regression line ŷ = b₀ + b₁x and interpret its slope and intercept in context; (2) use the line to predict ŷ for a given x and compute a residual (observed − predicted); (3) interpret r² as the share of the variation in y the line explains; (4) explain inference for the slope — whether the slope is significantly different from 0 — by comparing a p-value to α, and read a residual plot. |
| Key vocabulary | response (y) vs. explanatory (x) variable, least-squares regression line, line of best fit, predicted value ŷ ("y-hat"), slope b₁, intercept b₀, prediction, extrapolation, residual (observed − predicted), residual plot, coefficient of determination r², variation explained, inference for the slope, null hypothesis (slope = 0), t-test for the slope, p-value, significance level α, statistically significant |
| Materials | slides (Deck 15), the week's readings + video links, a spreadsheet (Google Sheets or Excel) with =SLOPE, =INTERCEPT, =RSQ, one approved chatbot (Gemini / Claude / ChatGPT) for the AI-critique moment and the tutorial |
| Timing note | 8 segments, ~150 min total. Session 1 = Segments 1–4 (~75). Session 2 = Segments 5–8 (~75). |
Pre-computed throughout. Every number in this outline is worked out and friendly. The hero dataset's least-squares line is genuinely ŷ = 50 + 4x (the spreadsheet's
=SLOPE/=INTERCEPTreturn exactly 4 and 50). All predictions, residuals, r², and the p-value-vs-α comparisons below are supplied — the materials hand you the regression output; the chatbot never fits a line from raw data on the fly.
Segment 1 — Hook & the Promise (8 min) · Session 1 opens
Hook. Put one line on the board: "For every extra hour a student studies, their exam score goes up about 4 points." Then ask: "If that's true, and a friend tells you they're going to study 7 hours for the final — what score would you bet on? And how sure are you?"
- Wait for guesses. Someone will do 7 × 4 = 28 and get stuck (28 of what?). Walk them to it: we also need a starting point — the score for someone who studied basically nothing.
- "That's the whole week. Back in Week 4 we learned to measure a relationship with the correlation r. This week we go one giant step further: we draw the actual line, use it to predict, and then — the grown-up move — we ask whether the line is real or whether we're fooling ourselves with noise."
The promise (write it on the board): "By the end of this week you can take any 'for every X, Y goes up by…' relationship and do four things: write its line, predict with it, measure how much it really explains, and decide whether the trend is statistically real."
Why it matters line (memory hook): "Correlation says they move together; regression draws the line that lets you bet on the next one — and inference tells you whether the line is worth betting on." (Callback to Week 4: correlation is a handshake, not a push — and a line still isn't a cause.)
Segment 2 — The Least-Squares Line: Slope & Intercept in Context (24 min)
Plain language first. When a scatterplot shows a linear pattern (Week 4: direction, form, strength), we can summarize it with one straight line — the least-squares regression line, the "line of best fit." We write it:
ŷ = b₀ + b₁x — read "y-hat equals b-naught plus b-one x."
- ŷ ("y-hat") is the predicted value of y the line gives for an x. The hat means predicted, not observed (callback to Week 1's p̂ — the hat means estimated, not the real thing).
- b₁ is the slope — how much ŷ changes for each one-unit increase in x.
- b₀ is the intercept — the predicted ŷ when x = 0.
Why "least-squares"? Of all the lines you could draw, this is the one that makes the total squared vertical distance from the dots to the line as small as possible. (Conceptual only — we read the line, we don't derive it.)
The hero dataset (we'll reuse it all week). Six students' weekly study hours (x) and exam score (y):
| Student | A | B | C | D | E | F |
|---|---|---|---|---|---|---|
| Hours (x) | 1 | 2 | 3 | 4 | 5 | 6 |
| Score (y) | 54 | 59 | 61 | 66 | 69 | 75 |
Run the fit (Segment 8 shows the spreadsheet) and you get — supplied, exact:
ŷ = 50 + 4x (slope b₁ = 4, intercept b₀ = 50).
Interpret the slope and intercept in context (this is the deliverable — say it in words, with units):
- Slope b₁ = 4: "For each additional hour studied, the model predicts the exam score rises by about 4 points." Units travel: it's 4 points per hour, not just "4."
- Intercept b₀ = 50: "A student who studied 0 hours is predicted to score about 50." Always check whether x = 0 is even sensible (here it's borderline — 0 hours is on the edge of the data; we'll flag this as the extrapolation trap in Segment 4).
Memory hook (put it on a slide):
Slope = "per-one-x" change in ŷ. Intercept = ŷ when x = 0. ŷ wears a hat because it's predicted, not measured.
Land the key idea: the line is a summary and a prediction machine, stated in the real-world units of the problem. A slope without units and context ("the slope is 4") is only half an answer.
Segment 3 — Predict ŷ, and the Residual (22 min)
Plain language first. Once you have the line, prediction is just arithmetic: plug an x into ŷ = b₀ + b₁x.
Worked prediction (every step out loud, supplied numbers):
A new student plans to study 7 hours. Predict their score.
ŷ = 50 + 4(7) = 50 + 28 = 78. "The model bets on about a 78."
Now the honest part — the residual. No real student lands exactly on the line. The residual measures the miss:
residual = observed y − predicted ŷ (always observed minus predicted).
Worked residual (supplied):
Student E studied 5 hours and actually scored 69.
Predicted: ŷ = 50 + 4(5) = 70. Residual = 69 − 70 = −1.
Read it: "E scored 1 point below what the line predicted." A negative residual means the dot sits below the line (the line over-predicted). A positive residual sits above the line.
A second one so the sign sticks:
Student F studied 6 hours and scored 75. Predicted ŷ = 50 + 4(6) = 74. Residual = 75 − 74 = +1 → F sits 1 point above the line.
Memory hook:
Residual = observed − predicted. Positive → above the line. Negative → below the line. Closer to 0 → the line nailed it.
Quick interaction (think-pair-share, ~6 min): give students the line ŷ = 50 + 4x and one point — "Student G studied 3 hours and scored 61." Ask for (a) the predicted score and (b) the residual. (Answer: ŷ = 50 + 4(3) = 62; residual = 61 − 62 = −1, so G is 1 point below the line.) Surface anyone who flips the subtraction (predicted − observed) — the convention is observed minus predicted, every time.
Segment 4 — Misconceptions + r² and Extrapolation (21 min) · Session 1 closes (~75)
First, r² — what the line explains. The coefficient of determination r² is the share of the variation in y that the regression line explains. It's just the correlation squared, and it lives between 0 and 1 (report it as a percent).
For the hero dataset, the spreadsheet's
=RSQreturns r² ≈ 0.99, i.e. about 99% of the variation in exam scores is explained by study hours — an unusually tight fit (these are friendly demo numbers).
A more typical classroom example to teach the idea: if r = 0.9, then r² = 0.81 = 81% — "81% of the variation in y is explained by x, and the remaining 19% is due to everything else (other factors + scatter)."
Read r² in one honest sentence: "r² is the percent of the ups-and-downs in y that the line accounts for." Higher r² = points hug the line = predictions are tighter.
Name the misconceptions out loud, then cure each:
- ❌ "r² is the slope."
✅ Cure: completely different jobs. The slope is the per-unit change (4 points per hour, carries units); r² is a unitless share of variation between 0 and 1 (0.81 = 81%). A steep line can have a low r² (fuzzy cloud) and a gentle line a high r² (tight band). Slope = how much; r² = how well. - ❌ "A high r² (or strong line) proves x causes y." — correlation ≠ causation, revisited.
✅ Cure: the line and r² only describe and predict; they never supply the causal arrow. Ask the Week-4 questions: plausible lurking variable? was anything randomly assigned? A regression on observational data is still a link, not a cause. (More study time may help — but motivation or prior prep could drive both.) - ❌ "The line works for any x — just plug it in." — extrapolation.
✅ Cure: the line is trustworthy only inside the range of the data (here, ~1–6 hours). Predict for x = 40 hours and you get ŷ = 50 + 4(40) = 210 — a 210 on a 100-point exam, which is nonsense. Predicting outside the data is extrapolation, and it's where regression lies. Stay in the room the data was collected in. - ❌ "The intercept is always meaningful."
✅ Cure: b₀ is only a real-world statement if x = 0 is sensible and near the data. "0 hours studied → predicted 50" is borderline; "predicted weight at height 0 inches" is meaningless. Interpret the intercept, but say when it's just a mathematical anchor.
Memory hook: "Slope = how much, r² = how well, and neither one is a cause. Never drive the line off the edge of the map (extrapolation)."
Segment 5 — Inference for the Slope: Is It Really There? (26 min) · Session 2 opens
Hook back in: "Last session we fit a line to six students and got a slope of 4. But here's the unsettling question: if I'd grabbed a different six students, I'd get a slightly different slope. So is the true slope really positive — or could the real relationship be flat (slope 0), and our 4 is just luck of the draw?"
Plain language first — the question inference answers. Our slope b₁ = 4 came from a sample. Behind it is a true population slope (call it β₁) we can't see. Inference for the slope asks: is the slope significantly different from 0? — i.e., is there real evidence of a linear relationship in the population, or could the truth be a flat line?
Set it up as a hypothesis test (callback to Weeks 13–14):
- Null hypothesis H₀: slope = 0 — no linear relationship (y doesn't change with x).
- Alternative Hₐ: slope ≠ 0 — there is a linear relationship.
- The tool is a t-test for the slope. The regression output hands you a t-statistic and a p-value; you compare the p-value to your significance level α (usually α = 0.05), exactly like every test in Week 13.
The decision rule (the only rule you need today):
p < α → reject H₀ → the slope IS significantly different from 0 (real evidence of a linear relationship).
p ≥ α → fail to reject H₀ → the slope is NOT significantly different from 0 (the trend could be noise).
Worked example (output supplied — we read it, we don't compute it):
The regression for our study-hours data reports slope b₁ = 4, t = 11.8, p = 0.001. Use α = 0.05.
Compare: p = 0.001 < 0.05 = α → reject H₀ → the slope is statistically significant. In words: "There is strong evidence (p = 0.001) that exam score really does change with study hours — the positive slope is not just noise."
The counter-case (so "significant" means something):
A different study regresses exam score on a student's height and reports slope b₁ = 0.6, p = 0.42, α = 0.05.
Compare: p = 0.42 > 0.05 → fail to reject H₀ → the slope is not significantly different from 0. In words: "We don't have evidence that score changes with height — the apparent slope is consistent with a flat line (pure chance)."
Memory hook (put it on a slide):
H₀: slope = 0 (flat line, no relationship). p < α → slope is real (reject). p ≥ α → could be noise (fail to reject). "Low p, slope's legit."
Land the key idea: a non-zero slope in your sample isn't enough — the p-value vs. α tells you whether to believe the relationship beyond your sample. Significant ≠ large, and significant ≠ causal; it only means "probably not zero."
Segment 6 — Residual Analysis: Reading the Residual Plot (18 min)
Plain language: before you trust a line, you check its residuals. A residual plot graphs each x against its residual (observed − predicted). It's a stethoscope for the fit.
What you want to see — and what's a red flag:
- ✅ Healthy: residuals scattered randomly above and below the zero line, no pattern, roughly even spread. Translation: a straight line was the right choice.
- ❌ Curve (a U or arch) in the residuals: the relationship was really curved — a straight line is the wrong model, even if r looked okay. (Callback to Week 4: r ≈ 0 can hide a U-shaped relationship; here a pattern in the residuals exposes a bad straight-line fit.)
- ❌ Fanning out (residuals grow as x grows): the spread of y isn't constant; predictions get shakier at one end.
Worked example (supplied residuals for the hero dataset, on ŷ = 50 + 4x):
| Hours x | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Predicted ŷ | 54 | 58 | 62 | 66 | 70 | 74 |
| Observed y | 54 | 59 | 61 | 66 | 69 | 75 |
| Residual | 0 | +1 | −1 | 0 | −1 | +1 |
Plot those residuals (0, +1, −1, 0, −1, +1) against x: they bounce randomly around 0 with no shape — a clean residual plot. Verdict: the straight line is an appropriate fit. (Notice every residual is tiny, which is why r² was ~99%.)
Memory hook: "A residual plot should look like a boring, patternless cloud around zero. If you can see a shape in it, your model has the wrong shape."
Quick interaction (~4 min): show two residual plots on a slide — one a random band around 0, one an obvious U-shape. Ask which model is trustworthy. (The random band; the U-shape says "you fit a line to a curve.")
Segment 7 — Putting It Together: One Full Read of a Regression (20 min)
Plain language: real statistical software (or a spreadsheet's regression tool) hands you a little output table. The whole week is learning to read it. Here is the supplied output for the study-hours regression — read every box in plain English:
Regression: exam score on study hours (n = 6)
- Slope (b₁) = 4 → +4 points of score per extra hour studied.
- Intercept (b₀) = 50 → predicted score at 0 hours (borderline extrapolation — flag it).
- r² = 0.99 → ~99% of the variation in scores is explained by hours (very tight fit).
- t = 11.8, p = 0.001 → with α = 0.05, p < α, so the slope is statistically significant (the relationship is real, not noise).
- Residual plot → random scatter around 0, no pattern → a straight line is appropriate.
One-paragraph plain-language conclusion (model the SLO-B deliverable):
"In this sample, each extra hour of study is associated with about 4 more exam points, and study hours explain roughly 99% of the score differences. The relationship is statistically significant (p = 0.001), so it's very unlikely to be a fluke. Two cautions: this is observational data, so studying isn't proven to cause higher scores, and the line should only be used within about 1–6 hours of study — don't extrapolate to 40 hours."
Misconception sweep (rapid cures):
- ❌ "Significant means big / important." ✅ It means probably not zero — a tiny slope can be significant with enough data; a large slope can be non-significant with little.
- ❌ "r² near 1 means the line is true / causal." ✅ r² is about tightness of fit, not truth or cause. A spurious correlation can have a high r².
- ❌ "A small p-value proves causation." ✅ p-values test slope ≠ 0, never x causes y. Causation still needs an experiment or a ruled-out lurking variable.
Quick mini-debate (genuinely arguable, ~4 min): "A study of 200 cities finds that for every extra coffee shop per capita, the median home price rises by \$3,000, with p < 0.001 and r² = 0.55. A realtor says, 'Build coffee shops to raise home values.'" Argue both sides; surface that the slope is significant and explains real variation, yet it's observational (lurking variable: neighborhood wealth/density drives both) — so no causal advice, and don't extrapolate to a town with no data.
Segment 8 — Technology Workflow + AI-Critique, Callback & Hand-off (12 min) · Session 2 closes (~75)
Technology workflow — fit and read a regression in a spreadsheet (exact steps):
1. Put the two variables in two columns — study hours in A2:A7, exam score in B2:B7.
2. Get the slope: in an empty cell type =SLOPE(B2:B7, A2:A7) → returns 4. (Order matters: =SLOPE(known_ys, known_xs).)
3. Get the intercept: =INTERCEPT(B2:B7, A2:A7) → returns 50. So the line is ŷ = 50 + 4x.
4. Get r²: =RSQ(B2:B7, A2:A7) → returns about 0.99. (Google Sheets and Excel are identical for all three.)
5. Predict a new x in a cell: =50 + 4*7 → 78. Residual for an observed point: =observed - (50 + 4*x), e.g. =69-(50+4*5) → −1.
6. For the t and p-value of the slope, use the full regression tool (Excel: Data ▸ Data Analysis ▸ Regression; Sheets: an add-on or =LINEST for the advanced version) — but in this course you read the supplied t and p; you don't compute them by hand.
AI-critique moment (students verify, not consume):
Paste this to an approved chatbot, with the line and output supplied: "A regression gives ŷ = 50 + 4x for exam score on study hours, fit on students who studied 1–6 hours, with p = 0.001 and r² = 0.99. A student asks: 'So if I study 40 hours I'll score 210, right? And this proves studying causes higher grades?' Is that reasoning correct?"
Then check its answer against today's lesson. A careless model may happily plug in 40 and report 210, or call the relationship causal. The honest answer flags two errors: (1) extrapolation — x = 40 is far outside the 1–6 hour data, so ŷ = 210 is meaningless (and impossible on a 100-point exam); (2) correlation ≠ causation — a significant slope and high r² describe and predict, but this is observational, so studying isn't proven to cause the scores. Your job all semester: the tool drafts, you judge.
Callback + tease:
- Callback: "Week 4 we measured a relationship with r and warned that correlation isn't cause. This week we drew the line (slope, intercept), predicted with it (ŷ, residuals), said how much it explains (r²), and tested whether it's real (p vs. α) — and the causation warning still stands."
- Tease next week: "Week 16 is the final — cumulative across all eight objectives. We'll pull the whole arc together: from where data comes from (Week 1) to drawing the line and testing it (this week). Your study guide, exam-prep tutorial, and practice exam are in the Week 16 module."
Hand-off (the week's graded work):
- Lecture Tutorial 15 (AI tutor, share-link submission) — the least-squares line, slope/intercept in context, prediction & residuals, r², and inference for the slope.
- Quiz 15, Discussion 15 ("Read the headline's slope" — find a real "for every X, Y rises by…" claim and reason about the slope, r², and the extrapolation/causation pitfalls), and Assignment 15 (four worked problems). This is the last regular week — the final is next week.
Instructor FAQ — Common Stumbles
| Student says / does | Quick cure |
|---|---|
| "The slope is 4." (no units, no context) | Half an answer. Make them say it in the problem's words: "4 points of exam score per additional hour studied." The slope is always a per-one-unit-of-x change in y, with units. |
| Computes the residual as predicted − observed. | The convention is observed − predicted, every time. Re-do it: observed 69, predicted 70 → 69 − 70 = −1 (the dot is below the line). Sign carries meaning. |
| "r² is the slope" / reports r² with units. | Two different things. Slope = per-unit change (carries units); r² = unitless share of variation, 0 to 1 (0.81 = 81% explained). Steep ≠ high r². |
| Plugs a far-out x into the line (e.g., 40 hours → 210). | Extrapolation. The line is only trustworthy inside the data's x-range (~1–6 hrs). Outside it, the prediction is unsupported — and here, impossible (210 on a 100-pt exam). |
| "A significant slope means studying causes higher scores." | Inference tests slope ≠ 0, not causation. Observational data → a link, not a cause (Week 4 returns). Hunt the lurking variable; check for random assignment. |
| Confuses "significant" with "large." | Significance = probably not zero (p < α). A tiny slope can be significant with lots of data; a big slope can be non-significant with little. Different questions. |
| "p = 0.001 is the probability the slope is 0." | Keep it operational at this level: p < α → reject H₀ (slope ≠ 0); p ≥ α → fail to reject. Don't over-philosophize the p-value; compare it to α and decide. |
| Reads a U-shaped residual plot as "fine, points are near zero." | A pattern in the residual plot (a curve) means a straight line is the wrong model, regardless of how small the residuals are. You want a patternless cloud around 0. |
| Interprets the intercept when x = 0 is nonsense. | Interpret b₀ only when x = 0 is sensible and near the data. Otherwise call it a mathematical anchor, not a real-world prediction. |
Scope flag
This outline stays within Objective 8 — simple linear regression and inference for the slope. We read regression output (slope, intercept, r², t, p) and interpret it; we do not derive the least-squares formulas, compute r by hand, or calculate the t-statistic's standard error — those are supplied. Multiple regression is out of scope (spine, Objective 8). The full output-table read in Segment 7 is added synthesis (it ties the week together for the final); cut it for a leaner 60-minute version. Keep the extrapolation-to-40-hours and the height-vs-score non-significant cases — they make "significant," "extrapolation," and "not a cause" stick.
~ Prof. Rivera's edition · Fall 2026 · built with thecoursemaker.com