Week 7 · Quiz

Week 7 Quiz — Multimodal AI: Voice, Audio, Images & Documents

Using Artificial Intelligence · AI 101 Fall 2026 · Prof. Quinn Fictional sample

Course: Using Artificial Intelligence (AI 101) · Silver Oak University (fictional sample) · Prof. Quinn
Covers: multimodal AI definition · voice prompting (Skill 8) · record → transcribe → analyze workflow (Skill 9) · tool → modality matching · transcription errors and summary fabrications · image creation vs. image analysis
Format: 10 auto-graded items (multiple-choice, multiple-answer, matching, true/false) · 10 points (1 each) · allowed attempts: 1 · No AI on this quiz.

This is the human-readable quiz with its vetted answer key and one-line feedback. The import-ready Classic QTI 1.2 is in F-quiz-week-07-qti.xml (generated by a validated Python script — parses with 10 items, every single-answer item exactly one correct). Reminder: AI is not permitted on quizzes — this checks that you understand the Week 7 ideas.

Questions, key, and feedback

Q1 (MC). The term "multimodal AI" refers to AI systems that —
- A. can only process text typed in a chat box
- B. can process and/or generate more than one type of data, such as text, audio, images, and documents ✅
- C. use multiple servers to respond faster
- D. can only be used on mobile devices
Feedback: Multimodal means multiple modes of data — text, voice, images, documents. The major chatbots (ChatGPT, Claude, Gemini, Copilot) are increasingly multimodal; they are not text-only.

Q2 (True/False). True or False: Modern AI chatbots such as ChatGPT, Claude, and Gemini are text-only tools and cannot process images or audio in any form.
- True
- False ✅
Feedback: False. These assistants now support image uploads and, in many modes, voice/audio input. "Chatbots are text-only" is one of the classic Week-7 misconceptions.

Q3 (Matching). Match each task to the best tool type for completing it.
| Task | Best tool type |
|---|---|
| Convert a voice recording of a meeting into a text file | Audio transcription tool (e.g., Whisper-class or built-in phone recorder) |
| Generate a new illustration from a written description | Text-to-image generation tool (e.g., DALL·E, Midjourney, Adobe Firefly) |
| Extract the line items from a photo of a receipt | Multimodal chatbot with image upload (e.g., ChatGPT, Claude, Gemini) |
| Ask questions about the contents of an uploaded PDF | Multimodal chatbot with document upload (e.g., ChatGPT, Claude, Gemini) |
Feedback: Each modality has the right direction: audio → text (transcription tool); text → image (generation tool); image → text (multimodal chatbot with vision); document → text/answers (multimodal chatbot with document upload).

Q4 (MC). When you speak a prompt to an AI assistant using voice mode, what happens FIRST before the AI processes your words?
- A. The AI listens to your audio and understands it directly with no intermediate step
- B. Your speech is converted to text through a transcription step, which can introduce errors ✅
- C. The AI searches the internet for audio matching your voice
- D. The AI compresses your audio file and stores it permanently
Feedback: Voice mode has two steps: (1) speech → text (transcription, can have errors); (2) text → AI response. Errors in step 1 flow into step 2. Always check the transcript displayed.

Q5 (MC). In the record → transcribe → analyze workflow, what is the CORRECT sequence?
- A. Analyze with AI first, then transcribe the audio, then record it
- B. Transcribe first, then record, then analyze with AI
- C. Record the audio, then convert it to text with a transcription tool, then paste the text into an AI assistant ✅
- D. Record the audio and paste it directly into an AI chatbot without a transcription step
Feedback: The order is fixed: record → transcribe → analyze. You can't transcribe before you record, and the AI needs the text transcript — it cannot process a raw audio file in most free-tier workflows.

Q6 (Multiple answer — select all that apply). Which TWO of the following are places where errors can enter the record → transcribe → analyze workflow?
- A. During the transcription step, when audio is converted to text (mis-heard words, dropped phrases) ✅
- B. During the recording step, before any AI is involved
- C. During the AI's analysis step, when the AI may add details, smooth contradictions, or invent conclusions not in the transcript ✅
- D. During the playback step, when you listen to the recording again
- E. During the file-naming step, when you save the recording
Feedback: The two error-entry points are A (transcription errors — the text may not match the audio) and C (summary fabrications — the AI may add plausible but invented details). Recording quality affects transcription but is not itself an AI error-entry point; B, D, and E are not AI steps.

Q7 (MC). A student uses voice mode to ask an AI for help. The AI's response seems off — it answered a slightly different question than the one they meant to ask. What is the BEST first fix?
- A. Switch to a different AI assistant, because voice mode is broken on this one
- B. Check the transcribed text that the AI received — it likely mis-heard a word — then rephrase and speak again more clearly ✅
- C. Give up on voice mode and always type instead
- D. Ask the AI to try harder to understand the original audio file
Feedback: This is the "what's the prompting fix?" scenario. The problem is almost certainly a transcription error in step 1. Check what the AI actually received as text, fix the mis-heard word, and re-prompt clearly. Abandoning voice mode (C) or blaming the tool without diagnosing (A/D) misses the fix.

Q8 (True/False). True or False: AI-powered transcription tools reliably produce a perfect, error-free text from any audio recording.
- True
- False ✅
Feedback: False. Transcription accuracy varies with audio quality, accents, background noise, crosstalk, and technical vocabulary. Never assume transcription is perfect — always review the transcript before passing it to an AI for analysis.

Q9 (MC). A student says: "I uploaded a photo to DALL·E to find out what's in it." What is wrong with this statement?
- A. Nothing — DALL·E is the best tool for analyzing photos
- B. DALL·E is a text-to-image generation tool (text in → image out); it is not designed to analyze or describe photos (image in → text out) ✅
- C. DALL·E only works with black-and-white images
- D. DALL·E can analyze images, but only if they were generated by DALL·E itself
Feedback: Direction matters. DALL·E creates images from text — it doesn't analyze or describe photos. For image analysis (image → text), you'd use a multimodal chatbot with vision, such as ChatGPT (with vision enabled), Claude, or Gemini. Image creation and image analysis are different tasks requiring different tool types.

Q10 (MC). Which statement best describes what an AI assistant is doing when it "analyzes" an image you upload?
- A. It perceives the image exactly as a human would, with the same depth, context, and emotional response
- B. It looks up the image in a database to find matching photos and their descriptions
- C. It performs pattern recognition on the pixel data and generates a description — which can be confident and wrong ✅
- D. It reads the image's hidden metadata to identify everything in it accurately
Feedback: AI image analysis is pattern recognition on pixel data, not human-style vision. The AI generates a plausible description — which can be confident and wrong. It has no awareness of context, history, or meaning beyond what the pixels suggest. Never use AI image analysis as the sole source of truth for anything important.

Answer key (quick reference)

Q	Answer	Q	Answer
1	B (process multiple types of data)	6	A, C (transcription step; AI analysis step)
2	False (chatbots are not text-only)	7	B (check the transcript first)
3	Matching — see table above	8	False (transcription is not always accurate)
4	B (speech → text first, then AI processes text)	9	B (DALL·E generates images; it doesn't analyze them)
5	C (record → transcribe → analyze)	10	C (pattern recognition; can be confident and wrong)

Blueprint & item-bank note

#	Type	Concept	Objective
1	MC	Multimodal AI definition	3
2	True/False	Chatbots are not text-only	3
3	Matching	Tool/modality → best task	3
4	MC	Voice prompting two-step process	3
5	MC	Record → transcribe → analyze order	3
6	Multiple answer	Two error-entry points in the workflow	3
7	MC	"What's the prompting fix?" (voice mode off-answer)	3
8	True/False	Transcription is not always accurate	3
9	MC	Image creation vs. image analysis (direction)	3
10	MC	AI image analysis = pattern recognition, not human perception	3

All 10 items are tagged course=AI101 · week=7 · objective=3 and deposited into the item bank for future per-term regenerations. Distractors target the classic misconceptions: "chatbots are text-only"; "transcription is always accurate"; "an AI 'reading' an image truly sees like a human"; confusing image creation (text → image) with image analysis (image → text).

Quality gate (self-checked)

Structure: 10 items, 1 point each; types = 6 MC + 2 true/false + 1 matching + 1 multiple-answer.
Single-answer integrity: every MC and true/false has exactly one correct option; the matching item pairs one-to-one; the multiple-answer item keys A and C.
Product-accuracy gate: PASS. Tools named factually (ChatGPT, Claude, Gemini for multimodal; DALL·E, Midjourney, Adobe Firefly for image generation; Whisper for transcription); no invented features, no price/version claims; conceptual claims (multimodal definition, transcription two-step, workflow order, error-entry points, image analysis as pattern recognition) are accurate and non-controversial.
QTI parse confirmation: F-quiz-week-07-qti.xml parses as imsqti_xmlv1p2 with 10 items; generated by validated build_qti.py script.

Canvas placement block

canvas_object    = Quizzes::Quiz
title            = "Week 7 Quiz — Multimodal AI: Voice, Audio, Images and Documents"
assignment_group = "Quizzes"
points_possible  = 10
grading_type     = points
available_from_offset_days = 42
due_offset_days  = 48
published        = true
allowed_attempts = 1
shuffle_answers  = true
ai_permitted     = false
provenance       = "~ Prof. Quinn's edition · Fall 2026 · built with thecoursemaker.com"

This is the human-readable quiz with its vetted answer key and rationale. The import-ready Classic-QTI version (F-quiz-week-07-qti.xml) ships inside the course's .imscc package — it lands in the Canvas gradebook on import.

~ Prof. Quinn's edition · Fall 2026 · built with thecoursemaker.com