Week 4 · Lecture outline

Week 4 — Lecture Outline · Exploring Relationships

Introduction to Statistics · MATH 11 Fall 2026 · Prof. Rivera Fictional sample

Course: Introduction to Statistics (MATH 11) · Silver Oak University (fictional sample) · Prof. Rivera
Objectives covered: Objective 3 — Describe the relationship between two variables.
SLOs touched: A (reason quantitatively from data) · B (communicate results to a non-technical audience)
Meeting pattern: 2 sessions × 75 min = 150 min. Segment minutes below total ~150; scale to your own pattern.

Week at a Glance


The week's big question	"When two things move together, what is the data actually telling us — and what is it absolutely not telling us?"
By the end of the week, students can…	(1) read a scatterplot and describe its direction, form, and strength; (2) interpret a correlation coefficient r — what its sign and size mean, and the traps it hides; (3) build and read a two-way table, computing marginal and conditional proportions to compare groups; (4) name a lurking / confounding variable and explain in plain words why correlation ≠ causation.
Key vocabulary	explanatory (x) vs. response (y) variable, scatterplot, direction (positive/negative), form (linear/curved/none), strength, outlier, correlation coefficient r, r = +1 / 0 / −1, two-way (contingency) table, cell, marginal proportion, conditional proportion, lurking variable, confounding variable, common-response, causation
Materials	slides (Deck 4), the week's readings + video links, a spreadsheet (Google Sheets or Excel), one approved chatbot (Gemini / Claude / ChatGPT) for the AI-critique moment and the tutorial
Timing note	8 segments, ~150 min total. Session 1 = Segments 1–4 (~75). Session 2 = Segments 5–8 (~75).

Segment 1 — Hook & the Promise (8 min) · Session 1 opens

Hook. Put one sentence on the board and say nothing: "People who eat more ice cream are more likely to drown." Let it sit. Then: "That's a real, repeatable pattern. The numbers don't lie. So — should we ban ice cream at the beach?"
- Wait for the laugh, then the objection: it's summer. Hot days drive both the ice cream and the swimming (and the drowning). A hidden third thing is pulling both strings.
- "This week is about exactly that gap — the difference between two things moving together and one thing causing the other. It is the single most abused idea in all of statistics, and by Thursday you'll catch it in the wild."

The promise (write it on the board): "By the end of this week you can take any 'X is linked to Y' headline and do three things: picture the relationship, measure it with one number, and decide whether it's a real cause or a hidden third variable wearing a disguise."

Why it matters line (memory hook): "Correlation is a handshake, not a push — two things greeting each other isn't one shoving the other." (Callback to Week 1's coffee-and-grades; today we make it precise.)

Segment 2 — Scatterplots: Direction, Form, Strength (22 min)

Plain language first. When we have two quantitative variables measured on the same individuals, the picture that shows their relationship is a scatterplot — one dot per individual, placed by its two values.
- First decide which variable goes where. The explanatory variable (the one you think does the explaining) goes on the x-axis; the response variable (the one that responds) goes on the y-axis. Memory hook: "x explains, y responds."
- Then describe the cloud of dots with three words:
- Direction — positive (dots rise left-to-right; as x goes up, y goes up) or negative (dots fall; as x goes up, y goes down).
- Form — linear (dots follow a straight-line trend), curved, or no clear form.
- Strength — how tightly the dots hug that trend: strong (tight band), moderate, or weak (big scatter).
- Always add: any outliers — points that fall far from the overall pattern.

Memory hook (put it on a slide):

D – F – S: Direction, Form, Strength — say all three, every time, plus "any outliers?"

One fully worked example (do every step out loud).

Data: six students' weekly study hours (x) and exam score (y):
(2, 65), (3, 70), (5, 75), (6, 80), (8, 88), (10, 95).
1. Axes: study hours explains the score, so hours on x, score on y. ("x explains, y responds.")
2. Direction: as study hours rise (2 → 10), scores rise (65 → 95). Dots go up left-to-right → positive.
3. Form: plotted, the six dots fall almost on a straight line → linear.
4. Strength: they hug that line very tightly, with no stray point → strong, no outliers.
Spoken description (this is the deliverable): "A strong, positive, linear relationship between study hours and exam score, with no outliers."

Land the key idea: the scatterplot is the honest first look. Never report a single number (next segment) before you've eyeballed the picture — the picture is what tells you the number is even appropriate.

Segment 3 — Correlation r: What It Does and Doesn't Mean (25 min)

Plain language first. Eyeballing "strong/moderate/weak" is subjective. Correlation r is the one number that measures the direction and strength of a linear relationship between two quantitative variables.
- The sign matches the direction: r > 0 positive, r < 0 negative.
- The size lives on a fixed scale: r is always between −1 and +1.
- r = +1 → a perfect straight line sloping up. r = −1 → a perfect line sloping down.
- r = 0 → no linear relationship.
- The closer |r| is to 1, the tighter the linear pattern. Rough field guide: |r| around 0.8–1.0 strong, 0.5–0.8 moderate, below ~0.3 weak.

One fully worked example (interpret, with the arithmetic shown).

For the six study-hours/score points above, the formula (it averages the products of the two variables' z-scores) gives:
r = +0.997 — round and report r ≈ +0.99.
Read it in words: "A correlation of +0.99 means a very strong, positive, linear relationship — almost a perfect upward line. As study time rises, exam score rises in near-lockstep." (Matches the eyeball read from Segment 2 — the picture and the number agree.)

Name the four things r does NOT mean (cure each on the spot):
- ❌ "r measures any relationship." ✅ r measures only the linear part. A perfect U-shaped curve can have r = 0 even though the variables are tightly related — the picture would scream "relationship!" while r says zero. Always look at the scatterplot first.
- ❌ "r has units, like points or hours." ✅ r is a pure number, unitless. Swap hours for minutes and r doesn't change.
- ❌ "r = 0.6 means 60% of a straight line / twice as strong as 0.3." ✅ r is not a percent and is not linear in strength — 0.8 is much more than "twice" 0.4. (Just read sign + size; don't over-quantify.)
- ❌ "A big r proves X causes Y." ✅ r never proves causation — that's all of Segment 7. r is a measuring tape, not a verdict.

The cure that travels: "r is a thermometer for a straight line — it tells you the direction and how tight, and absolutely nothing else."

Segment 4 — Misconceptions + Quick Interaction (20 min) · Session 1 closes (~75)

Name the misconceptions out loud, then cure each:

❌ "A correlation near 0 means the two variables have nothing to do with each other."
✅ Cure: it means no linear relationship. They could still be perfectly related in a curve (think: hours of daylight over a year, or yield vs. fertilizer that helps then harms). r near 0 + a clear curve = a strong non-linear relationship. Picture first.
❌ "A strong correlation means a steep line."
✅ Cure: strength (how tight) and slope (how steep) are different things. A gentle line with dots glued to it has a higher r than a steep line in a fuzzy cloud. r is about scatter, not slope.
❌ "Switching which variable is x and which is y changes r."
✅ Cure: r is symmetric — corr(x, y) = corr(y, x). The number is identical; only the story about explaining changes.
❌ "r and percentages are the same kind of number."
✅ Cure: r lives on −1 to +1; it is not a probability or a percent. "r = 0.5" is not "50%."

Interaction — Think-Pair-Share (match the cloud to the number, ~10 min):
Show 6 small scatterplots on a slide (tight rising; loose rising; tight falling; loose falling; a clean U-curve; a shapeless blob). Students solo-match each to one of these r values: +0.95, +0.4, −0.9, −0.3, ≈0, ≈0 (30 sec), compare with a neighbor (1 min), then class votes.
Debrief the two that always split the room: the U-curve (r ≈ 0 despite an obvious relationship — the trap of the week) and the loose rising cloud (a real but weak +0.4, not "no relationship").

Segment 5 — Two-Way Tables: Marginal & Conditional Proportions (25 min) · Session 2 opens

Hook back in: "Scatterplots and r are for two number variables. But what about two category variables — like 'exercises daily?' vs. 'high energy?' You can't plot a dot. You build a table."

Plain language first. A two-way table (contingency table) cross-classifies individuals by two categorical variables — one across the top, one down the side. Each inside box is a cell count. The bottom and right edges are the totals.

The worked example — build it, then read it three ways (do every step out loud):

Study: 200 students each answered two yes/no questions — Do you exercise daily? and Would you call your energy high?

	High energy	Low energy	Row total
Exercises daily	72	18	90
Does not exercise	33	77	110
Column total	105	95	200

Read 1 — a cell: 72 students both exercise daily and report high energy.

Read 2 — a marginal proportion (one variable, from a margin / total):
- P(exercises daily) = row total ÷ grand total = 90 / 200 = 0.45 = 45%.
- P(high energy) = column total ÷ grand total = 105 / 200 = 0.525 = 52.5%.
Marginals ignore the other variable — they live in the margins.

Read 3 — a conditional proportion (restrict to one group first, then divide): this is the one that compares groups.
- P(high energy given exercises) = 72 ÷ 90 (the exercisers' row total) = 0.80 = 80%.
- P(high energy given does not exercise) = 33 ÷ 110 = 0.30 = 30%.

The comparison is the punchline: 80% vs. 30% — a 50-percentage-point gap. Exercisers are far more likely to report high energy. That's an association between the two categorical variables.

Memory hook: "Marginal = the whole pie's slice (divide by the grand total). Conditional = the slice inside one group (divide by that group's total)." The word "given" is your flag: given tells you which total goes on the bottom.

Segment 6 — The Comparison That Misleads, and the Honest One (18 min)

Plain language: the number that answers a "does it differ by group?" question is almost always a conditional proportion — and the most common mistake is dividing by the wrong total.

Misconception + cure (the heart of two-way tables):
- ❌ "72 out of 200 exercisers have high energy, so 36% of exercisers are energetic."
✅ Cure: 200 is the grand total, not the exercisers' total. To talk about exercisers, restrict to their row first — there are only 90 of them — then 72 / 90 = 80%. The word after "given" picks the denominator.

Why it still isn't proof (planting Segment 7):

The 80%-vs-30% gap is real. But did exercising cause the energy? Or do energetic people simply find it easier to exercise (the arrow runs the other way)? Or does something else — say, good sleep — produce both the exercise and the energy?
Hold the tension: a two-way table can reveal a strong association just like r can — and just like r, it cannot tell you the direction of the arrow.

Quick interaction (think-pair-share, ~5 min): give students this mini-table — of 80 bike owners, 28 commute by bike — and ask only: "What proportion of bike owners commute by bike?" (Answer: 28 / 80 = 0.35 = 35%, a conditional proportion — owners is the denominator, not some grand total.) Surface anyone who reached for a different bottom number.

Segment 7 — Lurking Variables & Correlation ≠ Causation (20 min)

Plain language first. A lurking variable is a third variable, not among the two you're looking at, that affects the relationship and can create a misleading association. When that third variable drives both of your variables, it's a confounding / common-response variable — and it manufactures a correlation with no causal arrow between your two.

Bring the hook home (worked example):

Ice cream sales and drowning deaths rise and fall together across the year — a strong positive correlation. But ice cream doesn't cause drowning. The lurking variable is hot weather / summer: heat drives both ice-cream buying and swimming (hence drowning). Remove summer and the link evaporates.
Picture: you want an arrow ice cream → drowning, but a hidden "temperature" node has arrows to both. The handshake looked like a push.

A second, campus-flavored case (so it isn't a one-off):

Headline: "Students who use a paid tutoring app have higher GPAs." (Observational.) A lurking variable — conscientiousness / family resources — could drive both buying the app and studying hard. The app may help, may not; the data alone can't separate the arrow from the lurker.

The two-question cure (say it, then drill it):
1. "Could a third variable explain both?" (Hunt the lurker.)
2. "Was anything actually randomly assigned?" If nothing was assigned → observational → you have a link, not a cause (callback to Week 1's observational-vs-experiment).

Memory hook: "When X and Y move together, always ask who else is in the room." (The lurking variable is the person you didn't notice.)

Misconception + cure:
- ❌ "They found a strong correlation, so X causes Y."
✅ Cure: name a plausible lurking variable and ask whether anything was randomized. A correlation — however strong, however large the r — is a starting question, not an answer.

Quick mini-debate (genuinely arguable, ~4 min): "Cities with more police officers have more crime. Should a city cut its police force to lower crime?" Argue both sides; surface the lurking variable (population / city size drives both more officers and more total crime).

Segment 8 — Technology Workflow + AI-Critique, Callback & Hand-off (12 min) · Session 2 closes (~75)

Technology workflow — scatterplot + correlation in a spreadsheet (exact steps):
1. Put your two variables in two columns — say study hours in A2:A7 and exam score in B2:B7.
2. Select A1:B7 → Insert ▸ Chart → choose Scatter (Google Sheets: "Scatter chart"; Excel: "Scatter"). Eyeball it first — direction, form, strength.
3. In an empty cell type =CORREL(A2:A7, B2:B7) → it returns 0.9974… ≈ +0.99. (Excel and Google Sheets are identical here; =PEARSON(...) gives the same number.)
4. Sanity-check the number against the picture: a tight upward line should give an r near +1. If the cell and the cloud disagree, trust the cloud and recheck your ranges.

AI-critique moment (students verify, not consume):

Paste this to an approved chatbot: "A study finds a correlation of r = 0.8 between the number of firefighters sent to a fire and the amount of damage. Does sending fewer firefighters reduce damage? Explain."
Then check its reasoning. A careless model may "explain" the correlation as if firefighters cause damage. The honest answer names the lurking variable — the size of the fire — which drives both more firefighters and more damage. Your job all semester: the tool drafts, you judge. Make students catch (or confirm) the model before moving on — exactly how the weekly Lecture Tutorial works.

Callback + tease:
- Callback: "Week 1 said correlation isn't causation. This week we made it precise: we can picture a relationship, measure it with r, tabulate it with conditional proportions — and we still can't draw the arrow without an experiment or a ruled-out lurker."
- Tease next week: "We've spent four weeks describing data. Week 5 turns the page to probability — the rules of chance that everything we measure is built on. From 'what happened' to 'what's likely.'"

Hand-off (the week's graded work):
- Lecture Tutorial 4 (AI tutor, share-link submission) — scatterplots (D-F-S), correlation r, two-way tables, lurking variables.
- Quiz 4, Discussion 4 ("Linked, or caused?" — find a real "X linked to Y" headline and reason about a lurking variable), and Assignment 4 (four worked problems).

Instructor FAQ — Common Stumbles

Student says / does	Quick cure
"The correlation is 0, so the two variables aren't related."	Only the linear part is 0. Show a clean U-shaped scatter with r ≈ 0 — obviously related, just not in a straight line. Picture before number.
Confuses strength with slope ("steeper = stronger r").	Strength = how tight the dots hug the line; slope = how steep. A gentle, tight line beats a steep, fuzzy one. r measures scatter, not slope.
Divides a two-way cell by the grand total when the question says "of the exercisers…".	The word "given"/"of" names the group — divide by that group's row or column total, not 200. Restrict first, then divide.
Mixes up marginal and conditional.	Marginal = divide by the grand total (it lives in the margin). Conditional = divide by one group's total. "Given" → conditional.
Reports r with units ("r = 0.99 points").	r is unitless — a pure number on −1 to +1. Changing units (hours → minutes) doesn't move it.
Reads r = 0.6 as "60%."	r is not a percent and not linear in strength. Just say sign + size: "moderate-to-strong positive." Don't translate it to a percentage.
Sees a strong r and says "X causes Y."	Run the two questions: plausible lurking variable? and was anything randomly assigned? No assignment → a link, not a cause.
"Ice cream and drowning are correlated, so one causes the other."	Name the lurking variable (summer heat) that drives both. The correlation is real; the causal arrow is not.
Swaps x and y and expects a different r.	r is symmetric — same number either way. Only the explanatory/response story changes, not the measurement.

Scope flag

This outline stays within Objective 3. The z-score formula for r and the least-squares line are deliberately deferred — this week is about describing and interpreting a relationship (direction/form/strength, the meaning of r, conditional proportions, and lurking variables), not computing r by hand or fitting a regression line. Simple linear regression and inference for the slope arrive in Week 15. Keep the ice-cream and firefighter cases — they make "correlation ≠ causation" unforgettable; cut the second tutoring-app case if you want a leaner 60-minute version.

~ Prof. Rivera's edition · Fall 2026 · built with thecoursemaker.com