Week 2 · Lecture outline

Week 2 — Lecture Outline · Summarizing Data

Introduction to Statistics · MATH 11 Fall 2026 · Prof. Rivera Fictional sample

Course: Introduction to Statistics (MATH 11) · Silver Oak University (fictional sample) · Prof. Rivera
Objectives covered: Objective 2 — Summarize and display univariate data — its shape, center, and spread. (This week opens Objective 2 with the display side: tables and pictures of one variable.)
SLOs touched: A (reason quantitatively from data) · B (communicate results to a non-technical audience)
Meeting pattern: 2 sessions × 75 min = 150 min. Segment minutes below total ~150; scale to your own pattern.

Week at a Glance


The week's big question	"How do you turn a pile of raw numbers into a single picture that tells the truth about them — and how can that same picture lie?"
By the end of the week, students can…	(1) build a frequency and relative-frequency table from raw data, choosing sensible classes; (2) build and read a histogram, and say what each axis and bar means; (3) name a distribution's shape — symmetric, skewed left, skewed right, uniform, or bimodal — and read center and spread off the picture; (4) spot an outlier and explain how it distorts a summary (especially the mean) and a graph.
Key vocabulary	raw data, distribution, frequency, relative frequency, cumulative frequency, frequency table, class / bin / interval, class width, histogram, bar chart (vs. histogram), shape, symmetric, skewed left (negative), skewed right (positive), uniform, bimodal, peak / mode of a graph, tail, center, spread, outlier, resistant vs. sensitive measure
Materials	slides (Deck 2), the week's readings + video links, a spreadsheet (Google Sheets or Excel) for the frequency-table and histogram demos, one approved chatbot (Gemini / Claude / ChatGPT) for the AI-critique moment, the tutorial, and the graded Assignment 2
Timing note	8 segments, ~150 min total. Session 1 = Segments 1–4 (~75). Session 2 = Segments 5–8 (~75).

Segment 1 — Hook & the Promise (8 min) · Session 1 opens

Hook. Put a wall of raw numbers on the screen — the unsorted commute times of 30 students (in minutes): 5, 8, 12, 15, 7, 22, 18, 9, 14, 33, 11, 6, 19, 24, 16, 3, 27, 13, 21, 8, 17, 10, 29, 14, 20, 4, 12, 35, 9, 18. Ask: "In five seconds, tell me the typical commute and whether anyone's is unusual." Wait. Nobody can.
- "Your eyes can't do anything with thirty loose numbers. Last week we learned how to get trustworthy data. This week we do the very first thing anyone does with it: turn the pile into a picture so a human can actually see it."

The promise (write it on the board): "By the end of this week you can take any list of numbers and, in two moves — a table, then a histogram — see its shape, its center, its spread, and the outliers that would otherwise fool you."

Why it matters line (memory hook): "You don't understand a variable until you've seen its shape. A summary you can't picture is a summary you can't trust."

Segment 2 — From Pile to Table: Frequency & Relative Frequency (22 min)

Plain language first.
- The whole list of values for one variable is its distribution — what values occur, and how often. Everything this week is a way to see a distribution.
- A frequency is just a count: how many times a value (or a range of values) shows up.
- A frequency table groups the data into classes (also called bins or intervals) and counts how many fall in each. For numbers with lots of distinct values, we group into equal-width ranges; the class width is how wide each range is.
- A relative frequency turns the count into a proportion of the whole: relative frequency = frequency ÷ total. It answers "what fraction (or %) landed here?" and lets you compare groups of different sizes. The relative frequencies of all classes add to 1 (100%).
- A cumulative frequency is a running total — "how many are at or below this class" — useful for "how many commute under 20 minutes?" questions.

Memory hook (put it on a slide):

Frequency = how many. Relative frequency = what share. Counts tell you size; shares let you compare.

One fully worked example (build the table out loud, every count).

Take the 30 commute times above. Choose a class width of 10, starting at 0. That gives four classes: 0–9, 10–19, 20–29, 30–39. Now tally (sorting first helps): sorted, the data are 3, 4, 5, 6, 7, 8, 8, 9, 9 | 10, 11, 12, 12, 13, 14, 14, 15, 16, 17, 18, 18, 19 | 20, 21, 22, 24, 27, 29 | 33, 35.

Class (min) Frequency Relative frequency Cumulative

0–9 9 9/30 = 0.300 = 30% 9

10–19 13 13/30 = 0.433 = 43.3% 22

20–29 6 6/30 = 0.200 = 20% 28

30–39 2 2/30 = 0.067 = 6.7% 30

Total 30 1.00 = 100% —

Check the table is honest: frequencies sum to 30 (the whole sample); relative frequencies sum to 1.00. If either doesn't, you miscounted.

Read it: the most common commute is 10–19 minutes (the biggest count, 13). Almost three-quarters of students (30% + 43.3% = 73.3%) commute under 20 minutes. Only 2 students commute 30+ minutes — already a hint of a couple of unusually long commutes.

Class (min)	Frequency	Relative frequency	Cumulative
0–9	9	9/30 = 0.300 = 30%	9
10–19	13	13/30 = 0.433 = 43.3%	22
20–29	6	6/30 = 0.200 = 20%	28
30–39	2	2/30 = 0.067 = 6.7%	30
Total	30	1.00 = 100%	—

Land the key idea: the table is the bridge. We went from thirty numbers nobody could read to four rows anyone can. The histogram (next) is just this table, drawn.

Segment 3 — Building & Reading a Histogram (25 min)

Plain language first. A histogram is the picture of a frequency table for quantitative data.
- The horizontal axis is the variable, broken into the same classes as the table (the bins sit side by side, touching, because the number line is continuous).
- The vertical axis is the frequency (or relative frequency) — how tall each bar is.
- Each bar's height = the count in that class. Taller bar = more data there.

The one distinction students must nail — histogram vs. bar chart:
- A bar chart is for categorical data (major, blood type). Its bars have gaps between them and the order can be rearranged — they're separate labels.
- A histogram is for quantitative data. Its bars touch (no gaps) and the order is fixed by the number line — you cannot reorder 10–19 and 20–29.

Memory hook: Bars touch → histogram (numbers). Bars apart → bar chart (categories). "Gaps mean groups; touching means a number line."

One fully worked example (sketch it from the Segment-2 table):

Draw four touching bars over 0–9, 10–19, 20–29, 30–39 with heights 9, 13, 6, 2. Reading it without any arithmetic:
- Where's the peak? Over 10–19 — the tallest bar (the modal class). That's the most common commute.
- Which way does it lean? The bars get short to the right (a long thin tail of 20–29 and 30–39). The bulk is on the left, the tail trails right.
- Anything off on its own? The 30–39 bar is tiny and far from the crowd — a flag to check for outliers (Segment 6).
The picture says in one second what the raw list never could: most commutes are short, a few are long, the typical one is in the teens.

Two build-decisions to name (and the trade-off):
- Too few bins (say, two): you hide the shape — everything looks like one lump.
- Too many bins (say, one per minute): you get a spiky comb and the shape disappears into noise.
- Rule of thumb: aim for roughly 5–15 classes; pick a width that gives a readable shape. (A common starting point: 5 to 20 bins depending on sample size.)

Segment 4 — Misconceptions + Quick Interaction (20 min) · Session 1 closes (~75)

Name the misconceptions out loud, then cure each:

❌ "A histogram and a bar chart are the same thing."
✅ Cure: the axis tells you. A histogram's horizontal axis is a number line (quantitative), so bars touch; a bar chart's axis is categories (nominal/ordinal), so bars have gaps. Touching vs. apart isn't decoration — it's the data type.
❌ "The bars should always go down from left to right." (or "be even").
✅ Cure: a histogram's bar order is fixed by the number line, and the heights can do anything — peak in the middle, climb, fall, or have two humps. The heights tell the story; you never rearrange bars to make them tidy.
❌ "Relative frequency is a different shape than frequency."
✅ Cure: switching from counts to proportions just relabels the vertical axis (divide every height by the total). The bars keep the same relative heights, so the shape is identical — relative frequency only lets you compare groups of different sizes.
❌ "Skewed left means the graph points / leans to the left."
✅ Cure: skew is named for the tail, not the hump. Skewed left = the long thin tail stretches left (toward small values), even though the tall bulk sits on the right. (We'll drill this in Segment 5 — it's the week's #1 trap.)

Interaction — Think-Pair-Share (histogram or bar chart?, ~10 min):
Put 6 variables on a slide; for each, students decide histogram or bar chart solo (30 sec), compare with a neighbor (1 min), then class votes (fist = bar chart, open hand = histogram). Suggested items: students' exam scores · students' declared majors · daily high temperatures for a month · favorite streaming service · ages of people in line · eye color.
(Answers: histogram · bar chart · histogram · bar chart · histogram · bar chart.)
Debrief the principle out loud: quantitative → histogram (bars touch); categorical → bar chart (bars apart).

Segment 5 — Shapes of Distributions (25 min) · Session 2 opens

Hook back in: "Last session we built the picture. Today we learn to read its shape in one word — because shape is the first thing that decides which summary number you're even allowed to use."

Plain language first — the five shapes to recognize:
- Symmetric — the left and right halves are rough mirror images; a single hump in the middle. (The famous bell curve is the symmetric case.)
- Skewed right (positive skew) — one tall side on the left and a long thin tail stretching right (toward big values). Think incomes or house prices: most are modest, a few are huge.
- Skewed left (negative skew) — the mirror: tall side on the right, long thin tail stretching left (toward small values). Think scores on an easy exam: most high, a few low.
- Uniform — all bars about the same height; no real peak. Think rolling a fair die many times.
- Bimodal — two separate humps. Often a sign two groups are mixed together (e.g., heights of a room with men and women combined).

Memory hook (say it twice): "Skew is named for the tail, not the lump. The tail points the way you're skewed." Right tail → skewed right; left tail → skewed left.

One fully worked example (read shape, then read center & spread off it):

Show the commute histogram (heights 9, 13, 6, 2 over 0–9 … 30–39).
- Shape: single peak at 10–19, bars shrinking to the right with a thin tail at 30–39 → skewed right.
- Center (off the picture): the bulk sits in the teens, so a typical commute is about 10–19 minutes — you can see the center without computing it. (Next week we'll put exact numbers on "center.")
- Spread (off the picture): values run from about 3 to 35 minutes, so the spread (range) is roughly 32 minutes — most are bunched low, with a few stretching the range out.
One picture gave us shape, a typical value, and how spread out things are — the three things we always want.

Why shape matters (tease Week 3): in a skewed distribution the long tail drags the mean toward it, so the mean and median split apart — and which one is "the typical value" depends on the shape. Quick rule to plant now: in a right-skewed picture, mean > median; in a left-skewed picture, mean < median; symmetric, they're about equal. We prove this next week; today, just see it in the shape.

Segment 6 — Outliers, and How They Distort the Picture (20 min)

Plain language: an outlier is a value that sits far away from the rest of the data — unusually large or small. Outliers matter because they can quietly distort both your summary numbers and your graph.

Two distortions to name:
- It drags the mean. The mean adds every value and divides — so one extreme value pulls it hard. The median (the middle value) barely budges, because it only cares about position, not size. This is the difference between a sensitive measure (mean) and a resistant measure (median).
- It wrecks the graph. A single far-out value forces the horizontal axis to stretch to reach it, squashing all the real data into a few bins on the left — the histogram loses its shape and looks misleadingly empty.

Memory hook: "The mean chases the outlier; the median holds its ground." (And: one wild value can flatten a whole histogram.)

One fully worked example (numbers pre-computed — watch the mean move):

Nine employees' years of experience: 31, 33, 34, 35, 36, 37, 38, 40, 120. (That 120 is a data-entry slip — someone typed an age, not years.)
- With the 120: mean = 44.9 years, median = 36 years.
- Without the 120: mean = 35.5 years, median = 35.5 years.
- Read the damage: dropping one bad value moves the mean by ~9.4 years (44.9 → 35.5) but the median by only half a year (36 → 35.5). The mean chased the outlier; the median held. Reporting "average experience: 45 years" would badly misdescribe a team where everyone is in their 30s.
- A quick flag for "far away": a common rule calls a value an outlier if it's more than 1.5 × IQR beyond the quartiles. Here the upper cutoff works out to about 47, so 120 is flagged as an outlier. (We build the IQR formally next week; today the point is just that 120 is obviously, and detectably, far out.)

Misconception + cure:
- ❌ "An outlier is just a wrong number — delete it."
✅ Cure: sometimes it's an error (the 120), but sometimes it's the most important real value in the data (the one fraud transaction, the one superstar). Investigate before you delete. The job is to notice it and report its effect, not to silently erase it.

Quick mini-debate (~4 min): "A real-estate site reports the average home price in a neighborhood as \$1.2 million, but the median is \$420,000. Which number should a normal buyer trust, and why is the gap so big?" Surface: a few mansions (right-skew outliers) drag the mean up; the median better describes a typical home. Tie back to Segment 5's shape rule.

Segment 7 — Putting It Together: Describe a Distribution in One Breath (12 min)

Plain language first. Statisticians describe any distribution with the same four-word checklist, always in plain language first:
1. Shape — symmetric, skewed (which way), uniform, or bimodal?
2. Center — where's the typical value?
3. Spread — how stretched out are the values (smallest to largest)?
4. Outliers — any values sitting far from the rest?

Memory hook: "Shape, Center, Spread, Outliers" — S-C-S-O, in that order, every time. (Some texts say "shape, center, spread, and unusual features" — same idea.)

One fully worked example (narrate the commute histogram in one breath):

"The commute times are skewed right (shape): most students cluster in the low teens, with a thin tail of longer commutes. The center is around 10–19 minutes (a typical commute). The spread runs from about 3 to 35 minutes. There are a couple of possible outliers at 33 and 35 minutes worth a second look."
That single sentence is a complete, honest summary — and it's exactly what a histogram is for. This is also how you'll write the plain-language interpretation on Assignment 2.

Land it (SLO B): a good description names all four, in order, in words a non-statistician could follow — no formulas required to see a distribution.

Segment 8 — Technology Workflow + AI-Critique, Callback & Hand-off (18 min) · Session 2 closes (~75)

Technology workflow — build a histogram in a spreadsheet (exact steps):
1. Put your data in column A (say A2:A31 for 30 values).
2. Google Sheets: select the data → Insert ▸ Chart → in the Chart editor set Chart type ▸ Histogram. Open Customize ▸ Histogram to set the bucket size (class width) — set it to 10 to match our table.
3. Excel: select the data → Insert ▸ Insert Statistic Chart ▸ Histogram; then right-click the horizontal axis → Format Axis ▸ Bin width = 10 (or set "Number of bins").
4. Read it against your table: the bar heights should match your frequencies (9, 13, 6, 2). Change the bucket/bin width and watch the shape change — too wide hides the peak, too narrow makes a spiky comb. The bin width is a choice, and it changes the story.

AI-critique moment (students verify, not consume):

Paste this to an approved chatbot: "Here are 12 numbers: 4, 5, 5, 6, 6, 6, 7, 7, 8, 9, 10, 25. Describe the shape of their distribution and tell me the typical value."
Then check its work against what we learned. Chatbots often (a) call this symmetric when the lone 25 makes it clearly skewed right with an outlier, and (b) report the mean (~8.2) as "typical" when the outlier has dragged it above almost every actual value — the median (6.5) is the honest center here. Your job all semester: the tool drafts, you judge. Catch the model: name the outlier, name the skew, and pick the resistant center. This is exactly how the weekly Lecture Tutorial and Assignment 2 work — you check the machine, you don't trust it.

Callback + tease:
- Callback: "Last week: who was measured and how. This week: the first thing we do with good data — give it a shape we can see, and refuse to be fooled by an outlier."
- Tease next week: "We've been eyeballing center and spread off a picture. Week 3 puts exact numbers on them — mean, median, mode, and the spread measures (range, IQR, standard deviation) — and proves why the median is the bodyguard against outliers we met today."

Hand-off (the week's work):
- Readings + videos (frequency tables, histograms, shape, outliers) — before Thursday.
- Lecture Tutorial 2 (AI tutor, share-link submission) — frequency/relative-frequency tables, histograms, shape, outliers.
- Practice exercises (ungraded reps) — the quick companion to the tutorial.
- The week's graded set — Quiz 2, Discussion 2, and Assignment 2 (adaptive, 100 pts). Assignment 2: build/read a frequency table, describe a histogram's shape, judge an outlier's effect, and write a plain-language interpretation; submit the AI's self-scored report + the chat share link.

Instructor FAQ — Common Stumbles

Student says / does	Quick cure
"Is this a histogram or a bar chart?"	Look at the horizontal axis. Numbers (a number line) → histogram, bars touch. Categories → bar chart, bars have gaps. The data type decides, not the look.
Draws a histogram with gaps between the bars.	Quantitative bins sit on a continuous number line, so they touch. Gaps are for separate categories. Close the gaps.
"Skewed left means it points left, right?"	Skew is named for the tail, not the lump. Skewed left = long thin tail to the left (small values); the tall bulk is on the right. Find the tail, then name the direction.
Thinks switching to relative frequency changes the shape.	It only relabels the vertical axis (every height ÷ total). Same relative heights → same shape. It just lets you compare different-sized groups.
Uses too few or too many bins and "loses" the shape.	Aim for ~5–15 classes. Too few = one lump (shape hidden); too many = a spiky comb (shape lost in noise). Adjust the width until the shape reads.
"The outlier is wrong — just delete it."	Investigate first. It may be an error (a typo) or the most important real value (one fraud, one superstar). Notice it, report its effect; don't silently erase it.
Reports the mean for an obviously skewed/outlier-laden data set.	The mean chases the outlier; in skewed data the median is the honest "typical value." Match the summary to the shape.
Says a distribution with two humps is "just bumpy."	Two clear humps = bimodal, often two groups mixed together. Name it — it's usually telling you to split the data.
Confuses frequency and relative frequency.	Frequency = how many (a count). Relative frequency = what share (count ÷ total, a proportion that sums to 1). Counts size; shares compare.

Scope flag

This outline opens Objective 2 with the display half — tables and graphs of one variable, and reading shape/center/spread/outliers qualitatively off a picture. The exact center-and-spread computations (mean, median, mode, range, IQR, standard deviation, five-number summary) and the formal 1.5 × IQR outlier rule are Week 3 — previewed here only enough to motivate "why shape matters," and flagged as added context. Cut the Week-3 previews (the mean-vs-median rule in Segment 5, the IQR cutoff in Segment 6) if you want a leaner 60-minute version.

~ Prof. Rivera's edition · Fall 2026 · built with thecoursemaker.com