Week 2 · Lecture outline

Week 2 — Lecture Outline · How AI Actually Works (Conceptually) & Its Limits

Using Artificial Intelligence · AI 101 Fall 2026 · Prof. Quinn Fictional sample

Course: Using Artificial Intelligence (AI 101) · Silver Oak University (fictional sample) · Prof. Quinn
Objective covered: Objective 1 — Explain what generative AI and large language models are, how they work conceptually, and their core capabilities and limits — including why AI can be "confidently wrong."
SLOs touched: A (produce high-quality results with AI through strong prompting) · B (evaluate and use AI critically)
Meeting pattern: 2 sessions × 75 min = 150 min. Segment minutes below total ~150; scale to your own pattern.

Week at a Glance


The week's big question	"What is actually happening inside the model — and why does that explain the mistakes it makes?"
By the end of the week, students can…	(1) explain what a token is and how LLMs process text token by token; (2) define the context window and say what happens when a conversation exceeds it; (3) explain hallucination at a plain-language conceptual level and name at least two real shapes it takes; (4) distinguish AI from search as different tools with different failure modes; (5) describe the Turing test (Turing, 1950) accurately, including what it does and does not prove.
Key vocabulary	token, context window, training cutoff, hallucination, fabrication, search engine vs. generative AI, Turing test, "Computing Machinery and Intelligence" (1950), capabilities vs. limits
Materials	slides (Deck 2), the week's readings + video links, one approved assistant (ChatGPT / Claude / Gemini / Copilot) for the live demos, the AI-critique moment, and the tutorial
Timing note	8 segments, ~150 min total. Session 1 = Segments 1–4 (~75). Session 2 = Segments 5–8 (~75).

Segment 1 — Hook & the Question (8 min) · Session 1 opens

Hook. Show a real AI answer that is confident, fluent, and factually wrong. Something with a plausible-sounding citation that doesn't exist, a made-up statistic, or an invented detail about a person or event — drawn from your own testing that week. Pause. "What just happened? The tool is not lying. It's not even trying to guess. It's doing exactly what it's always doing — predicting the next most likely token — and this time those tokens happen to be false. Today we make that make sense."

The promise: "By Thursday you'll be able to explain — in plain language, to anyone — why AI is so confidently wrong sometimes. And that explanation is the single most useful thing in this course. It changes what you trust and what you check."

Memory hook: "It's predicting likely text — not looking up true facts."

Segment 2 — Tokens: The Model's Unit of Thought (18 min)

Plain language first. When an LLM processes your message, it doesn't read words the way you do. It breaks everything into tokens — small chunks that are sometimes a word, sometimes part of a word, sometimes punctuation. "Unbelievable" might be three tokens. "AI" might be one. The exact split varies by model, but the key idea is: the model works one token at a time, predicting the next from everything before it.

Why this matters for you as a user:
- Very long words, rare words, and unusual names cost more tokens — the model processes them in pieces.
- A token is roughly 3–4 characters on average in English. A 1,000-word document is roughly 1,300–1,500 tokens. (Exact counts vary by model — don't quote a specific number as universal.)
- This is not the same as "the model only understands one word at a time." It sees everything in its context window simultaneously when generating each next token.

The training connection. The model learned to predict tokens by processing enormous amounts of text — books, websites, articles, code. It learned patterns. "After these words, what comes next? Statistically, probably this." That statistical learning is astonishingly powerful — and it's also why the model has no way to verify whether what comes next is true. It produces what's likely.

Live demonstration (show it; narrate): Paste a paragraph into a free tokenizer tool — e.g., the OpenAI tokenizer at platform.openai.com/tokenizer — and show the color-coded token breakdown. Note how different words chunk. Students see the abstraction made concrete. (The tokenizer tool link: https://platform.openai.com/tokenizer — verify it's live before class.)

Memory hook: "Tokens are the atoms the model thinks in — and it predicts, it doesn't know."

Misconception to clear early:
- ❌ "The model looks words up one at a time in a dictionary."
✅ Cure: there's no lookup. It learned statistical patterns from training text, and tokens are its basic unit — not meaning, just chunks.

Segment 3 — The Context Window (18 min)

Plain language first. Every LLM has a context window — the maximum amount of text it can "see" at once during a conversation: your messages, its replies, any documents you've pasted, any instructions. When your conversation exceeds the context window, earlier content starts to fall out. The model isn't reading a transcript; it's working with whatever fits in its current window.

Three things to understand about the context window:

It's a hard limit. Once content scrolls past the window, the model no longer has access to it. It isn't "forgetting" in the human sense — it simply doesn't have that text available anymore.
A bigger context window ≠ smarter or more truthful. A larger window holds more text (more of your long document, more of the conversation history), but it doesn't change the underlying mechanism. The model still predicts likely tokens. It can still be confidently wrong on a 200,000-token context just as easily as on a 4,000-token one.
The training cutoff is a separate, related limit. The model's training data has a cutoff date — events after that date are simply not in the patterns it learned. It doesn't "know" nothing happened; it may confidently generate text about post-cutoff topics anyway, because it's producing what's statistically likely, not what's currently true.

The Studio preview. This week you'll deliberately probe this limit in Studio 2 — start a long conversation or paste a very long text, then ask about something from early in the conversation and see what happens.

Interaction — Quick Think (2 min): "True or false: if I buy the $20/month plan, which has a bigger context window, the model will give me more accurate answers." (False — bigger window = more text fits; accuracy is a separate property of the model.)

Memory hook: "The context window is a sliding glass: only what's in the frame is visible."

Segment 4 — Hallucination: Why AI Is "Confidently Wrong" (18 min) · Session 1 closes (~75)

Plain language first. The field uses the word hallucination to describe AI output that is confident, fluent, and wrong — invented statistics, citations to books that don't exist, fake court cases, fabricated quotes. It's a colorful word and also a slightly misleading one (which you'll debate in Discussion 2).

Why does it happen? Connect to the mechanism. The model generates what comes next based on patterns. If you ask it for a citation about a topic it trained on, it produces text that looks like a citation — author, year, journal, page range — because that's the pattern for how citations appear. It has no way to verify whether that citation is real. It's not lying or hallucinating in the neurological sense. It's doing what it always does: predicting likely text.

Name the shapes (students need to recognize these all term):
- Invented citations — a plausible-looking journal article, book, or website that doesn't exist.
- Fabricated statistics — "studies show X%" where no such study exists.
- Fake case law — invented court case names, docket numbers, rulings. (This has happened in real legal filings.)
- Wrong arithmetic — the model is not a calculator; it predicts what numbers usually follow.
- Fabricated quotes — words attributed to a real person that they never said.
- Outdated information — confident answers about post-cutoff events.

The connection to capabilities vs. limits: The model's capabilities — fluent language, wide coverage, speed — are real. But they don't imply verified truth. Know when you're relying on the capability vs. when you need a fact. That's the distinction this course trains.

Quick interaction — classify (3 min): Put three "AI answers" on a slide (one correct, one hallucinated citation, one outdated fact). Pairs decide which is which and why. Debrief: what clue tipped you off?

Memory hook: "Confident + fluent ≠ true. The machine predicts; it doesn't verify."

Misconceptions + cures:
- ❌ "AI never makes things up." ✅ Cure: every LLM hallucinates — shapes and frequency vary, but none is immune. Verification is always required for factual claims.
- ❌ "A bigger context window fixes hallucination." ✅ Cure: the window is about capacity, not truthfulness. The mechanism (predict likely tokens) is the same regardless of window size.
- ❌ "AI is plugged into a live, verified fact database." ✅ Cure: it isn't. Its knowledge is frozen at the training cutoff; within that, it learned patterns, not verified facts.

Segment 5 — Search Engines vs. AI: Different Tools, Different Jobs (18 min) · Session 2 opens

Hook back in: "Last session: the tokens, the context window, why hallucination happens. Today: how that changes what you reach for when."

Plain language first — the core distinction:
| | Search engine | Generative AI (LLM) |
|---|---|---|
| What it does | Finds and ranks existing pages on the internet | Generates new text based on learned patterns |
| What you get | Links to real documents, up to date | Freshly written text that may or may not be accurate |
| Good for | Finding a specific thing that exists; current news; official sources | Drafting, explaining, transforming, brainstorming; first drafts |
| Failure mode | Not finding the right page; returning SEO spam | Confident, plausible, wrong output (hallucination) |
| Verification | The source is a real page you can read | The claim is the AI's own generation — you must check it |

The hybrid tools. Modern tools blur this line — ChatGPT with search on, Perplexity, Claude with web access. These can cite live sources, which reduces (but does not eliminate) hallucination risk. The distinction still matters: which mode is active? Is there a citation to check?

Worked example (live): Ask the same question to a search engine and to a chatbot in the same 2 minutes. "What is the current federal minimum wage?" A search engine returns official government pages; a chatbot may give a confident but potentially outdated number. Point made: different tool, different verification requirement.

Memory hook: "Search finds; AI writes. Both can be wrong. Both need a check."

Segment 6 — The Turing Test (18 min)

Plain language first. In 1950, the mathematician Alan Turing published "Computing Machinery and Intelligence" in the journal Mind — one of the most famous papers in computing history. In it, he proposed a test (which he called the "imitation game"): could a machine carry on a written conversation well enough that a human evaluator couldn't tell it was a machine? If so, Turing suggested, that was a meaningful benchmark for machine "intelligence."

What it is and isn't:
- The Turing test is a behavioral test — it tests performance, not inner experience, understanding, or consciousness. Passing it means a human couldn't tell the difference from a text transcript; it says nothing about whether there is genuine comprehension behind the answers.
- It is not a test for AGI, consciousness, or "real" understanding. Philosophers and scientists have debated this for 75 years (e.g., John Searle's "Chinese Room" argument, 1980).
- Today's context: large language models can pass many versions of the Turing test. That makes the test less useful as a forward-looking benchmark — but it makes it historically important to understand.

Two misconceptions to cure:
- ❌ "The Turing test proves a machine is conscious." ✅ Cure: it's a behavioral test for human-indistinguishable conversation — not a test of consciousness or understanding.
- ❌ "AI passing the Turing test means AI is 'smarter' than humans." ✅ Cure: it means AI can generate text a human evaluator can't distinguish from human text in a controlled exchange — that's a specific capability claim, not a general superiority claim.

Why this week: the Turing test connects directly to hallucination. An AI can pass a Turing test precisely because it generates fluent, confident, human-like text — and that same property is what makes it produce confident hallucinations. The capability and the failure mode have the same root.

Memory hook: "Turing tested behavior, not consciousness — 1950, a question still open."

Note on sources: Alan Turing's 1950 paper "Computing Machinery and Intelligence" was published in Mind, volume 59, issue 236. It is widely available online and no quote is needed here — the paper's ideas are the content.

Segment 7 — Capabilities vs. Limits: A Working Map (12 min)

The map students need going forward. Draw two columns on the board (or on a slide):

Genuine capabilities	Real limits
Fluent, wide-ranging language generation	No access to verified facts or live data (without tools)
Explaining, summarizing, transforming, brainstorming	Context window: only sees what fits
Wide coverage of training-era topics	Training cutoff: no knowledge of post-cutoff events
Pattern recognition across writing styles and formats	Statistical patterns, not logical reasoning (can fail on math, logic, novel situations)
Adapting to your requests over a conversation	Hallucination: generates plausible but unverified output

Key principle for the whole course: match the tool to the job. Use AI for generation, exploration, drafting, and transformation. Verify any specific fact, number, citation, or real-world claim before trusting it.

Quick interaction — match the task to the tool (3 min): show four tasks; pairs quickly say "search," "AI," "both + check," or "neither." E.g., "find the official IRS tax deadline" (search); "draft an email explaining why I'll be late" (AI); "get a statistics citation for my paper" (verify independently — don't trust AI citations alone).

Segment 8 — Technology Workflow + AI-Critique, Callback & Hand-off (16 min) · Session 2 closes (~75)

Technology workflow — the verify-the-AI habit for factual claims:
1. Generate with AI — get the draft, the explanation, the summary.
2. Identify the checkable claims — any specific fact, number, date, citation, name, or quote.
3. Check each claim independently — official site, reputable source, second tool.
4. Flag and fix the hallucinated ones; keep the accurate ones.
5. Track the pattern — over time, you'll learn where AI hallucinates in your field and prompt accordingly.

AI-critique moment (the course's central discipline — Week 2 form):

Ask an approved assistant: "Give me three recent academic citations about the effect of social media on teen mental health." It will produce confident-looking citations. Now: pick one and try to find it. Use Google Scholar or a library database. Report what you find. The model very likely produced a citation that looks real but isn't — or got the author, year, or title wrong. This is the hallucination lesson, live. You'll run this same move — but more deliberately — in Studio 2.

Callback + tease:
- Callback: "Week 1 gave us the mental model: AI predicts text, not truth; iterate, verify, stay the judge. Week 2 opened the hood: tokens are the unit, the context window is the limit, training cutoff and hallucination are the failure modes — and they all have the same root cause. Now you know why."
- Tease next week: "Next week we get practical: three prompting skills that let you direct the AI more precisely — have a conversation (and spot when it's just agreeing with you), provide content (paste your own documents), and use emphasis (Markdown, XML tags, capitalization) to control what it pays attention to."

Hand-off (the week's graded work):
- Lecture Tutorial 2 — tokens, context window, hallucination, Turing test.
- Quiz 2 (no AI) — 10 items; watch for the confusions we named.
- Discussion 2 — "Is 'Hallucination' the Right Word? / Diagnose a Confident Mistake."
- Assignment 2 — "Inside the Black Box (Conceptually)."
- AI Build Studio 2 — "Probe the Limits" — demonstrate a real context-window limit and catch a real hallucination. 50 points; start early.

Instructor FAQ — Common Stumbles

Student says / does	Quick cure
"A bigger context window would have caught that mistake."	Bigger window = more text fits; it doesn't change the predict-likely-tokens mechanism. Accuracy and capacity are separate properties.
Thinks "training cutoff" = "context window."	Training cutoff is when the training data ends (a date in the past). Context window is how much of the current conversation the model can see (a size limit). Two different limits.
"The AI is lying to me."	It's not — it has no intent. It's generating what comes next statistically. The output can be wrong without anyone lying.
"Passing the Turing test means AI has feelings / is conscious."	The test is behavioral — text conversation only. Passing it tells us about output, not inner experience. Philosophers disagree on what, if anything, follows.
Trusts AI citations without checking.	"Try to find that paper right now." Finding it solidifies the habit; not finding it makes the lesson unforgettable.
"If AI is so wrong, why use it?"	The capabilities (drafting, explaining, exploring, transforming) are real and useful. The limits (hallucination, context, cutoff) are manageable if you know them. Both are true at once.
Conflates search engine with AI.	Search finds existing pages (linked to sources). AI writes new text (generated, not sourced). Different tool, different job, different failure mode.

Scope flag

This outline covers Objective 1 at the "open the hood" level: tokens, context window, training cutoff, hallucination, search vs. AI, the Turing test, and a capabilities-vs-limits map. It deliberately avoids: the mathematical details of transformer architecture; specific token counts by model (volatile); price comparisons. Real products (ChatGPT, Claude, Gemini, Copilot, Perplexity) are named factually where relevant; the instructor and institution are fictional. The buyer-facing verb for the product is "generates."

~ Prof. Quinn's edition · Fall 2026 · built with thecoursemaker.com