Back to the Introduction to Statistics outline The Course Maker
Introduction to Statistics outline
Week 14 · Lecture outline

Week 14 — Lecture Outline · Tests for Means & Proportions

Introduction to Statistics · MATH 11 Fall 2026 · Prof. Rivera Fictional sample

Course: Introduction to Statistics (MATH 11) · Silver Oak University (fictional sample) · Prof. Rivera
Objectives covered: Objective 7 — Conduct and interpret hypothesis tests for means and proportions. (Week 13 built the logic; this week we turn the crank — the formulas that actually produce the test statistic and the p-value.)
SLOs touched: A (reason quantitatively from data) · B (communicate results to a non-technical audience)
Meeting pattern: 2 sessions × 75 min = 150 min. Segment minutes below total ~150; scale to your own pattern.

Campus note: this is a normal two-session week — we meet Tuesday Dec 1 and Thursday Dec 3, and the week's graded work is due Sunday Dec 6 (Discussion initial post Fri Dec 4, replies Sun Dec 6). Session 1 = Segments 1–4 (the one-sample t-test, end to end). Session 2 = Segments 5–8 (the one-proportion z-test, the two-sample idea, choosing the right test, and the tech/AI close).


Week at a Glance

The week's big question "Last week we learned the logic of a test — now, where does the p-value actually come from? How do we compute the evidence for a claim about a mean or a proportion, and how do we pick the right test for the question?"
By the end of the week, students can… (1) run a one-sample t-test for a mean end to end — state H₀/Hₐ → compute t = (x̄ − μ₀)/(s/√n) → compare the given p-value to α → conclude in context; (2) run a one-proportion z-teststate → compute z = (p̂ − p₀)/√(p₀(1−p₀)/n) → compare p to α → conclude; (3) explain, conceptually, what a two-sample comparison asks ("are these two group means different?") and interpret a supplied result; (4) choose the right test from the question — mean → t; proportion → z; one group vs. a fixed number → one-sample; two groups vs. each other → two-sample.
Key vocabulary one-sample t-test, test statistic, t = (x̄ − μ₀)/(s/√n), degrees of freedom (df = n − 1), one-proportion z-test, z = (p̂ − p₀)/√(p₀(1−p₀)/n), null value (μ₀, p₀), standard error, two-sample comparison (difference of means), one-sided vs. two-sided, p-value, significance level (α), reject / fail to reject H₀, conclusion in context
Materials slides (Deck 14), the week's readings + video links, a spreadsheet (Google Sheets or Excel) for the AI-critique moment, one approved chatbot (Gemini / Claude / ChatGPT). The numbers are friendly and the p-values are handed to you this week — the work is which test, set it up, compute the statistic, then compare and interpret, not finding tail areas by hand.
Timing note 8 segments, ~150 min. Session 1 = Segments 1–4 (~75; the full one-sample t pipeline). Session 2 = Segments 5–8 (~75; proportions, two-sample idea, choosing the test, tech + AI).

The numbers we use all week (pre-computed — never compute on the fly)

Rule for this course (continued from Week 13): every value here is given and kept friendly, and the problems are engineered to land on round test statistics (each one comes out to t = 2.00 or z = 2.00) so the arithmetic is clean. The p-values below are supplied — students compare them to α; they never look up a tail area by hand. The two critical values we need are embedded right here.

Embedded critical values (the only ones we use):

Situation Critical value Reject when
One-sided test, α = 0.05, z (large-sample proportion) z* = 1.645 test z > 1.645
Two-sided test, α = 0.05, z z* = ±1.96 |test z| > 1.96
One-sided t, α = 0.05, df = 24 t* = 1.711 test t > 1.711

The pre-decided examples (use these exact values; do not invent others):

Test Setup (friendly numbers) Test statistic Supplied p vs. α = 0.05 Decision
One-sample t (Segment 3) x̄ = 64, μ₀ = 60, s = 10, n = 25; Hₐ: μ > 60 (one-sided) t = (64−60)/(10/√25) = 4/2 = 2.00 (df = 24) p ≈ 0.028 < 0.05 reject H₀
One-proportion z (Segment 5) p̂ = 0.60, p₀ = 0.50, n = 100; Hₐ: p > 0.50 (one-sided) z = (0.60−0.50)/√(0.5·0.5/100) = 0.10/0.05 = 2.00 p ≈ 0.023 < 0.05 reject H₀
One-sample t (flip) same setup but p ≈ 0.20 supplied t small p ≈ 0.20 > 0.05 fail to reject H₀
Two-sample (interpret only) (Segment 6) two groups; we are handed p = 0.02 for "the means differ" at α = 0.05 (not computed by hand) 0.02 < 0.05 reject H₀ → the two means differ

The decision rule, unchanged from Week 13 and kept all week: p ≤ α → reject H₀; p > α → fail to reject H₀. Small p = surprising data = evidence against H₀.


Segment 1 — Hook & the Promise (10 min) · Session 1 (Tue Dec 1) opens

Hook. Put last week's headline back on the board: "A new study app raised the class average."
- "Last week I kept saying 'technology hands us p = 0.03.' All week, somebody's been wondering: where does that number actually come from? This week is the answer. We're going to build the evidence — turn a sample mean or a sample proportion into a single number called a test statistic, and that number is what produces the p-value."
- "Here's the whole week in one breath. Every test we do follows the exact same four beats from Week 13 — State → Compute → Compare → Conclude — only now Step 2, Compute, is a real formula instead of a number that fell from the sky. Two formulas this week. One for means (a t), one for proportions (a z). That's it."

The promise (write it on the board): "By the end of this week you can take a claim about a mean or a proportion, decide which test it needs, plug friendly numbers into one formula, get a test statistic, compare its p-value to α, and write the conclusion as a sentence about the real world."

Why it matters line (memory hook): "Same machine as last week. We're just installing the engine that produces the p-value — a t for means, a z for proportions."


Segment 2 — The One-Sample t-Test: What It Is and Why a t (16 min)

Plain language first — what question this test answers.

A one-sample t-test asks one thing: "Is the mean of my one group far enough from a specific claimed number (μ₀) to be more than chance wobble?" You use it whenever the claim is about an average measured against a fixed value — "the average wait time is more than 5 minutes," "the mean score is different from 70."

The test statistic, in plain words before symbols. We need a single number that says how many standard errors our sample mean sits away from the claimed mean. That number is t:

t = (x̄ − μ₀) / (s / √n)
- = the sample mean (what we measured). μ₀ = the claimed/null mean (the number in H₀).
- s / √n = the standard error of the mean — how much a sample mean of this size typically wobbles.
- So t is just "(how far our mean is from the claim) measured in standard-error units." A big |t| means our mean is far from μ₀ in wobble-units → surprising → evidence against H₀.

Why t and not z (one honest sentence). When we don't know the true population spread and must estimate it with the sample's s, the extra uncertainty makes the distribution a little wider than the normal — that wider curve is the t-distribution, pinned down by its degrees of freedom, df = n − 1. (That's the only reason it's a t and not a z; we don't derive it.)

Memory hook (put it on a slide):

t = (your mean − the claim) ÷ (the wobble). Mean question → t. df = n − 1.

Land the idea: the formula isn't new magic — it's the Week-13 logic with a number attached. t measures surprise; the p-value turns that surprise into a probability; α is the line.


Segment 3 — One-Sample t-Test, Fully Worked: State → Compute → Compare → Conclude (24 min) · Session 1 core

This is the centerpiece of the week — walk all four steps out loud, slowly, with the friendly numbers.

Scenario. A study-skills coach claims a new method raises students' average daily study time above the campus norm of 60 minutes. We sample n = 25 students who used the method: their mean is x̄ = 64 minutes with sample standard deviation s = 10 minutes. Test at α = 0.05.

Step 1 — STATE the hypotheses (about the parameter μ, never the sample x̄).
- H₀: μ = 60 (the method does nothing; the true mean is still the 60-minute norm).
- Hₐ: μ > 60 (one-sided — the claim is specifically higher).

Step 2 — COMPUTE the test statistic (plug in the friendly numbers; this is the new skill).

Standard error = s/√n = 10/√25 = 10/5 = 2.
t = (x̄ − μ₀)/(s/√n) = (64 − 60)/2 = 4/2 = 2.00. df = n − 1 = 24.
Read it in words: "our sample mean sits 2 standard errors above the claimed mean — that's fairly far out."

Step 3 — COMPARE the p-value to α. Technology hands us p ≈ 0.028 (the chance of a sample mean this high, or higher, if the true mean were really 60). (Cross-check with the embedded critical value: t = 2.00 > t* = 1.711, the one-sided α = 0.05, df = 24 cutoff — same verdict, two views.)

0.028 ≤ 0.05 → reject H₀.

Step 4 — CONCLUDE in context (the step students skip — make them write the sentence).

"At the 0.05 level, we have statistically significant evidence that the new method raises students' average daily study time above 60 minutes." — Not "we proved the method works"; not just "reject H₀."

Now flip ONE number to show the other verdict. Suppose instead the data had come out milder and technology handed us p ≈ 0.20. Reading: a sample mean like this would happen ~20% of the time even if the method did nothing — not surprising. 0.20 > 0.05 → fail to reject H₀. In context: "We don't have enough evidence to conclude the method raises study time." Not "the method definitely doesn't work" — just "this study didn't show it." (Same "not guilty ≠ innocent" point from Week 13.)

Quick interaction — "Set it up" (think-pair-share, ~4 min): put a fresh stem on a slide — "a clinic claims its average patient wait is under 15 minutes; a sample of 25 gives x̄ = 13, s = 5" — and have pairs write only H₀, Hₐ, and the standard-error denominator (5/√25 = 1), without finishing. The skill being drilled is set-up, not arithmetic.


Segment 4 — Misconceptions on the t-Test + Cure Each (16 min) · Session 1 closes (~75)

Name the misconceptions out loud, then cure each:

  • "Hypotheses are about x̄ — I'll write H₀: x̄ = 60."
    Cure: the hypotheses are always about the population parameter μ, never the sample mean x̄. x̄ is the evidence; μ is the claim. (Same rule as Week 13 — it just bites harder now that x̄ is sitting right there in the formula.)
  • "A bigger t-statistic means a bigger / more important effect."
    Cure: |t| measures surprise relative to wobble, not effect size. A large t can come from a tiny mean difference if n is large (small standard error). "Significant ≠ big," Week 13 — still true.
  • "I forgot the √n — t = (x̄ − μ₀)/s."
    Cure: the denominator is the standard error s/√n, not s. Dividing by √n is what turns a single sample's spread into the sample mean's wobble. Skipping it inflates the standard error and shrinks t every time.
  • "One-sided or two-sided? I'll just guess."
    Cure: read the claim. "Above / more than / less than" → one-sided (>, <). "Different from / changed" (no direction) → two-sided (≠). When genuinely unsure, two-sided is the safe, honest default.

Memory hook: "Hypotheses about μ, denominator is s/√n, and the claim's wording picks the sides."


Segment 5 — The One-Proportion z-Test: When the Question Is About a Percentage (24 min) · Session 2 (Thu Dec 3) opens

Hook back in: "Session 1 was about means — a t. But tons of real claims aren't about an average at all; they're about a proportion: 'more than half of voters support it,' 'the defect rate is above 2%,' 'the pass rate changed.' Same machine, a sibling formula — a z."

Plain language first — what question this test answers.

A one-proportion z-test asks: "Is my sample proportionfar enough from a claimed proportion p₀ to be more than chance?" Use it whenever the data are a percentage / share / yes-no rate of one group measured against a fixed value.

The test statistic.

z = (p̂ − p₀) / √( p₀(1 − p₀) / n )
- = the sample proportion (measured). p₀ = the claimed/null proportion (the number in H₀).
- √( p₀(1 − p₀)/n ) = the standard error of a proportion — built from p₀ (the null value), because we compute everything assuming H₀ is true.
- So z is again "(how far our share is from the claim) measured in standard-error units." Same idea as t, just the proportion's standard-error formula. (It's a z, not a t, because for a large enough sample the sample proportion is approximately normal — no separate spread to estimate.)

One fully worked example — all four steps (mirror Segment 3 exactly).

Scenario. A campaign claims more than half of a district's voters support a measure. A poll of n = 100 finds p̂ = 0.60 in favor. Test at α = 0.05.
1. STATE. H₀: p = 0.50 (support is no more than half — exactly half under H₀). Hₐ: p > 0.50 (one-sided — the claim is more than half).
2. COMPUTE. Standard error = √(p₀(1−p₀)/n) = √(0.50 · 0.50 / 100) = √(0.25/100) = √0.0025 = 0.05. z = (0.60 − 0.50)/0.05 = 0.10/0.05 = 2.00. "Our sample share sits 2 standard errors above 0.50."
3. COMPARE. Technology hands us p ≈ 0.023. (Cross-check: z = 2.00 > z* = 1.645, the one-sided α = 0.05 cutoff.) 0.023 ≤ 0.05 → reject H₀.
4. CONCLUDE in context. "At the 0.05 level, we have statistically significant evidence that more than half of the district's voters support the measure."

Misconception + cure (proportion-specific):
- ❌ "Use p̂ inside the standard-error square root."
Cure: for the test, the standard error uses p₀ (the null value), because the whole p-value is computed assuming H₀ is true. (That's the one spot a proportion test differs from a proportion confidence interval from Week 12, which uses p̂ — flag it, don't dwell.)
- ❌ "My percentage is 60, so p̂ = 60."
Cure: proportions go in as decimals — p̂ = 0.60, p₀ = 0.50 — or the formula breaks. 60% → 0.60.

Memory hook: "Proportion question → z. Standard error uses p₀, not p̂. Decimals, not percents."


Segment 6 — The Two-Sample Idea: Comparing Two Group Means (Interpret, Don't Derive) (16 min)

Plain language first — a different question.

So far every test compared one group to a fixed number (μ₀ or p₀). But the most common question in real research is a comparison of two groups to each other: "Does Group A's mean differ from Group B's mean?" — treatment vs. placebo, version A vs. version B, before vs. after. That's a two-sample comparison (a two-sample t-test for means).

What's the same, what's different (keep it conceptual — we interpret, we don't compute the formula by hand this week).
- Same machine. It still produces a test statistic and a p-value, and we still compare p to α and conclude in context — identical four beats.
- Different hypotheses. Now H₀ is "the two means are equal" (μ_A = μ_B, i.e., their difference is zero) and Hₐ is "they're different" (μ_A ≠ μ_B), or one-sided ("A is higher than B").
- Different "distance." Instead of (one mean − a number), the statistic is built from (mean of A − mean of B) over a standard error of that difference. Same shape — distance ÷ wobble — bigger sample, two groups. (The exact denominator is a Week-15-and-beyond detail; this week we read the verdict.)

One fully worked interpretation (we are handed the result — this is the skill).

Scenario. A company A/B-tests two checkout-page designs. Design A's mean order value is \$54; Design B's is \$50. A two-sample test of "the means differ" at α = 0.05 is run and technology hands us p = 0.02.
- State (for them): H₀: the two designs have the same mean order value (μ_A = μ_B). Hₐ: they differ (μ_A ≠ μ_B).
- Compare: 0.02 ≤ 0.05 → reject H₀.
- Conclude in context: "At the 0.05 level, there is statistically significant evidence that the two checkout designs produce different mean order values" — and, reading the direction from the data, Design A's mean is higher. Not "we proved A is better"; the test supports a real difference, and the size (\$4) is a separate, practical-significance question.

Misconception + cure (the week's most important one — choosing one- vs. two-sample):
- ❌ "I'll use a one-sample t to compare the two groups."
Cure: a one-sample test compares one group to a fixed number; a two-sample test compares two groups to each other. If the question names two sets of data (treatment and control), it's two-sample. If it names one set vs. a claimed value (μ₀ = 60), it's one-sample. Count the groups in the question.

Memory hook: "One group vs. a number → one-sample. Two groups vs. each other → two-sample. Count the groups."


Segment 7 — Choosing the Right Test: the Decision the Whole Week Rests On (16 min)

Frame it: "Computing a t or a z is the easy part — the part students actually miss on the exam is picking the right test in the first place. Here is the entire decision in two questions."

The two-question chooser (put it on a slide — this is the keeper):

Question 1 — Mean or proportion?
- The data are an average / measurement (minutes, dollars, scores) → it's a t-test (means).
- The data are a percentage / share / yes-no rate → it's a z-test (proportions).

Question 2 — One group or two?
- One group compared to a fixed claimed number (μ₀ or p₀) → one-sample.
- Two groups compared to each othertwo-sample.

The little table students copy:

What the question is about … vs. a fixed number … vs. another group
A mean (average, measurement) one-sample t-test two-sample comparison (t)
A proportion (share, rate, %) one-proportion z-test (two-proportion test — beyond this week)

Rapid-fire classification (think-pair-share, ~6 min): put 5 stems on a slide; students name which test (and one- vs. two-sided), solo (30 sec) → neighbor (1 min) → class votes. Suggested:
1. "Is the mean wait time more than 5 minutes?" (one sample of waits) → one-sample t, one-sided.
2. "Do more than 30% of users click the ad?" (one sample, a rate) → one-proportion z, one-sided.
3. "Does the new drug lower mean blood pressure compared to a placebo group?"two-sample (t), comparing two groups.
4. "Has the mean test score changed from the old average of 70?"one-sample t, two-sided ("changed").
5. "Is the defect rate different from the historical 2%?" (a rate vs. a number) → one-proportion z, two-sided.
Debrief the two that always split the room: #3 (it's two-sample because two groups are compared, not a group vs. a number) and #2 vs. #1 (rate → z, average → t).


Segment 8 — Technology Workflow + AI-Critique, Callback & Hand-off (12 min) · Session 2 closes (~75)

Technology workflow — compute a test statistic in a spreadsheet (exact steps):
1. One-sample t. In B1 put xbar = 64; B2 mu0 = 60; B3 s = 10; B4 n = 25. In B5 type the standard error: =B3/SQRT(B4) (→ 2). In B6 type the statistic: =(B1-B2)/B5 (→ 2.00).
2. One-proportion z. In D1 phat = 0.60; D2 p0 = 0.50; D3 n = 100. In D4 the standard error: =SQRT(D2*(1-D2)/D3) (→ 0.05). In D5 the statistic: =(D1-D2)/D4 (→ 2.00).
3. The decision (last week's formula, reused). Put the supplied p-value in F1 (e.g. 0.028), α in F2 (0.05), and in F3: =IF(F1<=F2,"Reject H0","Fail to reject H0"). Change F1 to 0.20 and watch the verdict flip. (Google Sheets and Excel are identical.)

AI-critique moment (students verify, not consume — the signature habit):

Paste to an approved chatbot: "I ran a one-sample t-test: x̄ = 64, μ₀ = 60, s = 10, n = 25, Hₐ: μ > 60, and p ≈ 0.028 with α = 0.05. Compute the t-statistic and state the conclusion."
Then audit the answer against this week's rules. Two failure modes to hunt: (a) the chatbot drops the √n and reports the wrong standard error / t (check: s/√n must be 10/5 = 2, so t = 2.00); and (b) it slides into a forbidden Week-13 sentence — "p ≈ 0.028 means a 2.8% chance the null is true" (wrong) — or calls a significant result "large/important." Make students catch and rewrite: "t = (64−60)/(10/√25) = 2.00 on df = 24; since p ≈ 0.028 ≤ 0.05 we reject H₀ and conclude, at the 0.05 level, that the mean study time is above 60 — not that there's a 2.8% chance H₀ is true, and not that the effect is necessarily large." The tool drafts; you judge. This is exactly how the weekly Lecture Tutorial works.

Callback + tease:
- Callback: "Week 13 we built the logic — H₀/Hₐ, the p-value, compare to α, conclude. This week we installed the engines that produce the p-value: a t for one mean, a z for one proportion, and the two-sample idea for comparing two groups. The four beats never changed — State → Compute → Compare → Conclude."
- Tease next week: "Week 15 is the last new tool: linear regression — fitting a line to a scatterplot and running a hypothesis test on its slope ('is there really a relationship, or just noise?'). Same testing machine, one more setting."

Hand-off (the week's graded work):
- Lecture Tutorial 14 (AI tutor, share-link submission) — the one-sample t-test, the one-proportion z-test, the two-sample idea, and choosing the right test.
- Quiz 14, Discussion 14 (find a real "two groups compared" claim — an A/B test, a drug-vs-placebo trial, a before/after study — and reason about the test used and whether the conclusion is justified), and Assignment 14 — all due Sun Dec 6 (Discussion initial post Fri Dec 4).


Instructor FAQ — Common Stumbles

Student says / does Quick cure
Can't decide which test to run. Two questions: (1) mean (average/measurement) → t; proportion (share/rate/%) → z. (2) one group vs. a number → one-sample; two groups vs. each other → two-sample. Count the groups.
Writes hypotheses about x̄ or p̂ (the sample). Hypotheses are about the population parameter (μ, p) — the unknown truth — never the sample statistic you measured. x̄ and p̂ are the evidence.
Drops the √n in the t denominator. The denominator is the standard error s/√n, not s. Dividing by √n converts one sample's spread into the sample mean's wobble. Here: 10/√25 = 2.
Uses in the proportion standard error. For a proportion test, use p₀ inside √(p₀(1−p₀)/n) — everything is computed assuming H₀ is true. (Confidence intervals use p̂; tests use p₀.)
Enters a percent, not a decimal. 60% goes in as 0.60, 50% as 0.50. Percent values blow up the standard error.
Confuses one-sample and two-sample. One-sample = one group vs. a fixed claimed number; two-sample = two groups compared to each other. If the stem names two datasets, it's two-sample.
Picks one- vs. two-sided at random. "Above / more than / less than" → one-sided (>, <); "different / changed" → two-sided (≠). Unsure → two-sided (the honest default).
Reports only "reject H₀" with no context. Finish the sentence: "At the 0.05 level we have significant evidence that [the real-world claim about the mean/proportion]." The number isn't the conclusion.
Thinks a bigger t or z means a bigger effect. |t|, |z| measure surprise relative to wobble, not effect size — a large sample can make a tiny effect's statistic large. Significant ≠ big (Week 13).
"p ≈ 0.028 means a 2.8% chance H₀ is true." No — the p-value is computed assuming H₀ is true; it's "how surprising the data are if H₀ holds." (Carried straight from Week 13's #1 misread.)

Scope flag

This outline stays within Objective 7, now at the mechanics-and-interpretation level for one-sample means and proportions. The two-sample work is kept strictly conceptual (state the hypotheses, interpret a supplied p-value) per the course's depth decision that "two-sample work is kept to interpretation, not derivations" — we do not compute a two-sample standard error by hand. The embedded critical values (t* = 1.711, z* = 1.645 / ±1.96) and the engineer-to-t = 2.00 / z = 2.00 design are deliberate so the arithmetic never gets in the way of the reasoning. Trim the two-sample interpretation for a leaner 60-minute version; the must-keeps are the full one-sample t pipeline (Segment 3), the one-proportion z pipeline (Segment 5), and the choose-the-right-test chooser (Segment 7).

~ Prof. Rivera's edition · Fall 2026 · built with thecoursemaker.com