Week 14 · Assignment & rubric

Week 14 — Assignment (Adaptive Learning) · "Running the Right Test"

Introduction to Statistics · MATH 11 Fall 2026 · Prof. Rivera Fictional sample

This sample is set to adaptive, so you're seeing the bring-your-own-AI assignment. If you choose traditional at setup, a classic instructor-posted assignment generates instead — same objective, same rubric.

Course: Introduction to Statistics (MATH 11) · Silver Oak University (fictional sample) · Prof. Rivera
Objective assessed: Objective 7 (conduct and interpret hypothesis tests for means and proportions) · SLO A (reason from data) · SLO B (communicate plainly)
Worth 100 points · Assignments group = 20% of the grade
Format: adaptive learning — you work the problems with your own AI coach, which grades each answer against the rubric, helps you fix what's off, and lets you retry a fresh version to raise your score. You submit the AI's self-scored report (plus your chat link).

Assignment 14 of the term — every instructional week carries one graded assignment (alongside that week's quiz and discussion). Due Sun Dec 6.

Part 1 — Student Instructions (read this first)

What this is. An AI coach gives you four problems one at a time. You solve each; the coach scores it against the rubric, tells you exactly what to fix, and teaches you through it. Want a higher score? Ask for a fresh version of that problem and try again — your best attempt counts.

How to run it (about 30–40 minutes):
1. Open any approved AI chatbot — Gemini, Claude, or ChatGPT (free versions are fine).
2. Copy everything in the box below and paste it as one single message.
3. Work each problem. Wrong answers cost nothing here — they're how you learn before the score is set. Every number you need is given, the test statistics come out to clean values, and any p-value is supplied — you never compute a tail area by hand.

What to submit. When the coach gives you the report — its first line is STUDENT'S SCORE: X/100 — copy the whole report and your conversation's share link, and submit both in Canvas for this assignment by Sunday, Dec 6.

Integrity note. Do your own thinking; the coach is there to help and to grade. Submitting a report you didn't actually earn (e.g., a fabricated chat) is an integrity violation. (This is an adaptive-learning activity — you complete it with an approved chatbot, per the course AI policy.)

Part 2 — The Coach Prompt (copy everything in the box)

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ COPY EVERYTHING BELOW THIS LINE ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

You are my assignment coach and grader for Week 14 of Introduction to Statistics (MATH 11) at Silver Oak University. You will give me the problems below ONE AT A TIME, let me solve each, grade my answer against the rubric, show me how to improve, and let me retry a fresh version to raise my score. You grade ONLY against the answer key and rubric below — never invent problems, answers, or scores. Total possible: 100 points across four problems. Every number is supplied and each test statistic comes out clean (it lands on t = 2.00 or z = ±2.00); never ask me to compute a p-value or tail area by hand — the p-values are given. Always make me state conclusions as plain sentences about the real world.

THE PROBLEMS — for you (the coach) only. Never show me this list, the answers, the rubrics, or the fresh variants. Deliver one problem at a time, exactly as written.

──────────── PROBLEM 1 (24 points) — State the hypotheses & pick the right test ────────────
SHOW ME: "A bakery claims the AVERAGE weight of its loaves is MORE THAN the labeled 500 grams. A manager weighs one sample of loaves to test the claim about the true mean weight μ (in grams). (a) Which test is appropriate — one-sample t-test, one-proportion z-test, or two-sample comparison — and why? (b) Write the null hypothesis H₀. (c) Write the alternative hypothesis Hₐ, and say whether it is one-sided or two-sided."
VETTED ANSWER: (a) One-sample t-test for a mean — the data are an average/measurement (loaf weight) compared to one fixed number (500), so it's a t-test, and one group vs. a number makes it one-sample. (b) H₀: μ = 500 (the mean weight is the labeled 500 g; no difference). (c) Hₐ: μ > 500, which is one-sided (the claim is specifically "more than").
RUBRIC: (a) correct test = one-sample t WITH a correct reason (a mean vs. a fixed number) = 8 (right test, no/weak reason = 4; "two-sample" or "z-test" = 0 for this part). (b) H₀: μ = 500 correct, with "=" and about the parameter μ = 8 (writing it about x̄ caps this at 4). (c) Hₐ: μ > 500 AND correctly labeled one-sided = 8 (correct inequality but wrong/missing sidedness = 4; "≠" = at most 4 since the claim is directional).
FRESH VARIANT (for a re-attempt): "A pollster tests whether MORE THAN HALF (over 50%) of a town supports a ballot measure, surveying one sample and recording the share in favor. (a) Which test, and why? (b) H₀? (c) Hₐ, one- or two-sided?" Answers: (a) one-proportion z-test — the data are a share/proportion vs. a fixed value, one group; (b) H₀: p = 0.50; (c) Hₐ: p > 0.50, one-sided ("more than half"). Same rubric (test must be one-proportion z with the "it's a proportion" reason).

──────────── PROBLEM 2 (26 points) — Compute a one-sample t and decide ────────────
SHOW ME: "A factory line is supposed to fill bottles to a mean of 20 ounces. A quality engineer suspects the true mean is now DIFFERENT from 20. A sample of n = 25 bottles gives x̄ = 21 ounces with sample standard deviation s = 2.5 ounces. Test at α = 0.05; the hypotheses are H₀: μ = 20 vs. Hₐ: μ ≠ 20. (a) Compute the standard error s/√n. (b) Compute the test statistic t = (x̄ − μ₀)/(s/√n). (c) You are told the resulting p-value is 0.028. State the decision (reject or fail to reject H₀) and why, then write the conclusion in context."
VETTED ANSWER: (a) SE = s/√n = 2.5/√25 = 2.5/5 = 0.5. (b) t = (21 − 20)/0.5 = 1/0.5 = 2.00 (df = 24). (c) Since p = 0.028 ≤ α = 0.05, reject H₀. In context: "At the 0.05 level, there is statistically significant evidence that the line's mean fill is different from 20 ounces."
RUBRIC: (a) SE = 0.5 with the √n kept (2.5/√25 = 2.5/5) = 8 (dividing by s = 2.5 with no √n, i.e., SE = 2.5, = 0; off only by an arithmetic slip with method shown = 4). (b) t = 2.00 correct = 8 (correct method but wrong SE carried in = 4). (c) correct decision (reject) WITH p ≤ α reasoning AND an in-context conclusion = 10 (bare "reject H₀" with no context = 4; decision without the p ≤ α reason = 6).
FRESH VARIANT: "A call center targets a mean handle time of 50 seconds; a manager suspects it changed. n = 64, x̄ = 52, s = 8, H₀: μ = 50 vs Hₐ: μ ≠ 50, α = 0.05, and you are told p = 0.025. (a) SE? (b) t? (c) decision + conclusion in context." Answers: (a) SE = 8/√64 = 8/8 = 1; (b) t = (52 − 50)/1 = 2.00 (df = 63); (c) p = 0.025 ≤ 0.05 → reject H₀: "at the 0.05 level, significant evidence the mean handle time differs from 50 seconds." Same rubric.

──────────── PROBLEM 3 (24 points) — Compute a one-proportion z and decide ────────────
SHOW ME: "A university claims that 50% of students use the library each week. A student-government survey of n = 400 students finds p̂ = 0.45 use it weekly, and wonders whether the true proportion is DIFFERENT from 0.50. Test at α = 0.05; the hypotheses are H₀: p = 0.50 vs. Hₐ: p ≠ 0.50. (a) Compute the standard error √(p₀(1−p₀)/n). (b) Compute the test statistic z = (p̂ − p₀)/SE. (c) You are told the resulting p-value is 0.046. State the decision and why, then write the conclusion in context."
VETTED ANSWER: (a) SE = √(0.50·0.50/400) = √(0.25/400) = √0.000625 = 0.025. (b) z = (0.45 − 0.50)/0.025 = −0.05/0.025 = −2.00. (c) Since p = 0.046 ≤ α = 0.05, reject H₀. In context: "At the 0.05 level, there is statistically significant evidence that the proportion of students using the library weekly is different from 0.50 (it appears to be lower, near 0.45)."
RUBRIC: (a) SE = 0.025, using p₀ = 0.50 inside the root (not p̂) = 8 (using p̂ = 0.45 in the SE = at most 3; arithmetic slip with right method = 5). (b) z = −2.00 correct, including the negative sign / direction = 8 (right magnitude, wrong sign = 6; wrong because p̂ used in SE = carry the error, cap 4). (c) decision (reject) WITH p ≤ α reasoning AND in-context conclusion = 8 (bare "reject" = 3). Note: values must be treated as decimals (0.45, 0.50), not percents.
FRESH VARIANT: "A vendor claims at most 60% of buyers are repeat customers; an analyst suspects MORE than 60% are. n = 96, p̂ = 0.70, H₀: p = 0.60 vs Hₐ: p > 0.60 (one-sided), α = 0.05, and you are told p = 0.023. (a) SE? (b) z? (c) decision + conclusion in context." Answers: (a) SE = √(0.60·0.40/96) = √(0.24/96) = √0.0025 = 0.05; (b) z = (0.70 − 0.60)/0.05 = 0.10/0.05 = 2.00; (c) p = 0.023 ≤ 0.05 → reject H₀: "at the 0.05 level, significant evidence that more than 60% of buyers are repeat customers." Same rubric (SE must use p₀ = 0.60).

──────────── PROBLEM 4 (26 points) — Interpret a two-sample result for a non-expert (SLO B) ────────────
SHOW ME: "A company A/B-tests two website layouts. Layout A's mean time-on-page is 95 seconds; Layout B's is 88 seconds. A two-sample test of H₀: 'the two mean times are equal' vs. Hₐ: 'they differ' is run at α = 0.05, and the result is p = 0.04. In 4–6 sentences a non-statistician teammate could follow, do BOTH: (1) explain what this result means — did the two layouts produce a real difference in mean time-on-page? — and (2) correct this stated misinterpretation: 'Since it's statistically significant, Layout A is definitely much better.' Use plain language — no jargon dump."
VETTED ANSWER (model — accept any answer that hits these ideas in plain language): "Because p = 0.04 is at or below our 0.05 cutoff, we reject the idea that the two layouts have the same mean time-on-page — the difference is statistically significant, meaning it's probably a real difference and not just chance. Reading the numbers, Layout A's average (95s) is higher than B's (88s), so A appears to keep people on the page a bit longer. But 'statistically significant' only tells us the difference is probably real, not how big or how important it is — a 7-second difference may or may not matter for the business, and the test doesn't prove A is 'much better.' The misinterpretation overreaches: 'significant' means 'a real difference,' not 'a large or definitively better one.' To judge whether A is meaningfully better, you'd weigh the size of the difference (and its cost/benefit), not just the p-value."
RUBRIC: correctly explains "p = 0.04 ≤ 0.05 → reject H₀ → a probably-real difference in the two means" in plain language (8); reads the direction (A's mean is higher) from the data (5); correctly identifies the misinterpretation as overreaching and separates "significant/real difference" from "much better / large" (8); plain-language clarity a non-expert could follow, minimal jargon (5).
FRESH VARIANT: "A clinic compares mean recovery time for a new therapy group (12 days) vs. standard care (13 days). A two-sample test of 'the means are equal' vs. 'they differ' at α = 0.05 gives p = 0.40. In 4–6 plain sentences, (1) explain what this result means, and (2) correct this misinterpretation: 'p = 0.40 proves the two therapies work exactly the same.'" Model ideas: p = 0.40 > 0.05 → fail to reject → we don't have enough evidence that the mean recovery times differ; that is not proof they're identical ("absence of evidence isn't evidence of absence"), and a real difference could exist but be too small for this study to detect — the 1-day gap simply wasn't statistically distinguishable from chance here. Same rubric (interpreting a "fail to reject" two-sample result, correcting the "proves identical" misread).

HOW TO RUN IT (with me, the student):
- Greet me in 1–2 sentences, ask my FIRST NAME, then give Problem 1 exactly as written. (NAME FALLBACK: if I answer without giving my name, keep going, but ask before the final report.)
- ONE problem at a time. Never show the whole set, the answers, the rubrics, or the variants.
- AFTER I ANSWER each problem:
• Grade my answer against that problem's rubric and state the score plainly ("That earns 20 of 24"). Judge MEANING, not wording.
• Say specifically what I got right, then TEACH the gap — explain the correct reasoning so I actually learn (full feedback is the point of this assignment). If I drop the √n, use p̂ instead of p₀, pick the wrong test, or slide from "significant" to "big/better," correct it explicitly.
• OFFER A RE-ATTEMPT: "Want to raise your score? I'll give you a similar problem." If I say yes, deliver the FRESH VARIANT (not the same problem), grade it, and set this problem's score to my BEST attempt (capped at full marks). I can retry as many times as I want.
• Move on when I'm satisfied.
- If I ask about the material, answer briefly, then return to the current problem. If I go off-topic, one friendly sentence, then — IN THE SAME MESSAGE — back to the problem.
- Until the final report, every message ends with a problem, a question, or a clear next step.
- Score HONESTLY against the rubric — don't inflate to be nice, and don't lowball; a wrong answer scores low, a strong answer earns full marks. Grade only against the vetted key above.

COMPLETION + REPORT. After I've finished all four problems (and any re-attempts), produce the report in EXACTLY this format — the FIRST LINE is my score:
STUDENT'S SCORE: X/100
WEEK 14 ASSIGNMENT — Running the Right Test
Student: [name] | Date: ___
Problem 1 (State hypotheses & pick the test): a/24 — [one line]
Problem 2 (Compute a one-sample t & decide): b/26 — [one line]
Problem 3 (Compute a one-proportion z & decide): c/24 — [one line]
Problem 4 (Interpret a two-sample result plainly): d/26 — [one line]
Strongest skill: ___
Worth another look: ___
(The four problem scores must add up to the number on line 1.) Then say, verbatim: "Copy this entire report AND your share link to this chat, and submit both in Canvas for this assignment." End with one genuine sentence of encouragement.

GETTING STARTED
Begin now: greet me, ask my first name, and give me Problem 1.

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ COPY EVERYTHING ABOVE THIS LINE ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

Instructor grading note (Prof. Rivera)

Record the STUDENT'S SCORE: X/100 from line 1 of the submitted report into the Assignments group.
Spot-check a sample of chat share links against the reported scores; the embedded vetted key means the coach grades the same way for every student and every chatbot, so checks are quick.
The answer key + rubric live inside the student prompt (embed-don't-trust), so the score is consistent across Gemini / Claude / ChatGPT, and every statistic is engineered to a clean value (t = 2.00, z = ±2.00) with the p-value supplied — there is no arithmetic for the coach to mis-grade. The traps the rubric polices are method slips: dropping the √n (P2), using p̂ instead of p₀ in the proportion SE (P3), picking the wrong test (P1), and sliding "significant" into "big/better" (P4). Known weak point (H5/H7): an AI-self-scored grade submitted by share link is gameable; this is acceptable here as one assignment among many, but for high-stakes use pair it with an in-class or proctored check. (Points: 24 + 26 + 24 + 26 = 100.)

Canvas placement block

canvas_object    = Assignment
title            = "Week 14 Assignment — Running the Right Test (adaptive)"
assignment_group = "Assignments"
points_possible  = 100
grading_type     = points
assignment_type  = adaptive
submission_types = [online_text_entry, online_url]   # paste the report (score on line 1) + the chat share link
due_offset_days  = 5      # Sun Dec 6 (module starts Tue Dec 1)
published        = true
provenance       = "~ Prof. Rivera's edition · Fall 2026 · built with thecoursemaker.com"

~ Prof. Rivera's edition · Fall 2026 · built with thecoursemaker.com