Sensory Testing Methods
Sources: Coffee Sensory and Cupping Handbook by Fernández-Alduenda & Giuliano (SCA, 2021)
Sensory science uses three distinct categories of tests, each answering a different question. They should not be mixed in a single session, as they activate different cognitive modes in the taster.
| Category | Question | Who answers it | Example tools |
|---|---|---|---|
| Difference testing | ”Are these two coffees detectably different?” | Trained tasters or cuppers | Triangulation, 3-AFC |
| Affective testing | ”How much do people like this?” | Consumers or cuppers | 9-point hedonic scale, JAR scales |
| Descriptive analysis | ”What are this coffee’s sensory characteristics?” | Trained descriptive panel | CATA, full descriptive panel |
Difference Testing
Purpose
To determine whether a sensory difference exists between two (or more) coffee samples. Does not identify what is different or whether the difference is desirable — only that it is detectable.
Triangulation test
The most widely used difference test in coffee. A taster is presented with three cups — two of the same coffee (A, A) and one different (B) — and asked to identify the “odd” cup.
Setting up a triangulation:
- Pre-grind samples from the same batch brew to ensure the “same” samples are truly the same
- Serve at the same temperature; use identical cups; control visual cues (cup crust, color)
- Randomize the position of the odd cup (AAB, ABA, BAA, BBA, BAB, ABB all used equally)
- Minimum 6 tasters per session for reliable results
Interpreting triangulation results: The δ’ (delta-prime) statistic measures the sensory magnitude of the difference between two coffees. At δ’ = 1.75, about 10/18 tasters pass; at δ’ = 2.98, about 14/18 pass.
Controlling difficulty (δ’): The level of difficulty of a triangulation can be deliberately adjusted by blending two very different coffees (C. arabica and C. canephora) at different ratios. A 20:80 arabica:canephora blend vs. pure arabica creates a very high difficulty level (low δ’); a 5:95 blend vs. pure arabica creates a lower difficulty level (higher δ’).
Applications:
- Q Grader exams: candidates must pass 5/6 triangulations at a given difficulty level
- Taster training: assess and develop individual sensory discrimination ability
- Processing experiments: test whether a change in fermentation time, drying method, etc. produces a detectable difference
3-AFC (Alternative Forced Choice)
Much more powerful than triangulation for detecting differences in a specific attribute. A taster is given three cups and asked to identify which has the highest intensity of a named attribute (e.g., “Which cup has the highest acidity?”).
3-AFC is approximately twice as powerful as triangulation for the same attribute and sample — meaning you need fewer tasters to detect the same level of difference. However, the attribute must be clearly defined and tasters must understand it.
Applications:
- Research into processing variables (does longer fermentation increase acidity intensity?)
- Roast profile comparison (does a 1°C temperature change affect body intensity?)
Setting up difference tests well
- Ensure “same” samples come from the same batch and brew
- Serve all cups at the same temperature (coldest brew = slightly easier to find differences; hottest = most volatile aromatics)
- Use black or very dark cups to eliminate visual cues if color of brew is a variable
- Prohibit communication between tasters until all forms submitted
- Use paper forms rather than digital during the test (reduces social signaling)
Affective Testing
Purpose
To measure how much tasters like a product, or which of two products they prefer. Affective tests measure subjective experience — the whole point is to capture individual variation, not suppress it.
9-point hedonic scale
The gold standard for affective food testing. Developed by the US Armed Forces in the 1940s to measure soldiers’ food preferences. Terms:
| Score | Label |
|---|---|
| 9 | Like extremely |
| 8 | Like very much |
| 7 | Like moderately |
| 6 | Like slightly |
| 5 | Neither like nor dislike |
| 4 | Dislike slightly |
| 3 | Dislike moderately |
| 2 | Dislike very much |
| 1 | Dislike extremely |
The scale has equal psychological distance between each level. It has been validated across thousands of studies and numerous product categories. It can be used with cartoon faces instead of words for children or to reduce language barriers.
Limitation: does not tell you why people like or dislike — only how much. Must be combined with descriptive data to be actionable.
JAR (Just About Right) scales
Measures the appropriateness of a specific attribute. Example for “acidity”:
- Much too low
- A little too low
- Just about right
- A little too high
- Much too high
Useful for diagnosing specific consumer complaints and optimizing recipes (e.g., finding the ideal brew strength for a café’s house blend).
Preference mapping
Combines hedonic data with descriptive data to build a “map” of consumer preferences. Uses hierarchical cluster analysis and principal component analysis (PCA) to:
- Segment consumers by shared preference patterns
- Identify which sensory attributes drive liking in each segment
- Map coffee products to consumer segments
Example finding relevant to Kaiserblick: Research on brewing preferences across a student population found two consumer clusters: “strong coffee likers” (TDS ~1.5%, driven by nutty, roasted, dark chocolate attributes) and “weak coffee likers” (TDS ~1.0%, driven by tea/floral, sweet, cereal attributes). Kaiserblick’s specialty light roast targets the second cluster, who are underserved by mainstream commodity coffee.
Descriptive Analysis
Purpose
To objectively quantify the sensory attributes of a coffee. Unlike affective testing, descriptive analysis deliberately suppresses preference and focuses on neutral, accurate characterization. Output can be correlated with processing variables, chemical composition, cupping scores, or consumer preference data.
Full descriptive panel
- 8–12 trained panelists with calibrated sensory references
- Panelists assess attribute intensities on unstructured 15cm line scales (no numbers; tick marks only)
- Data analyzed by ANOVA, spiderweb plots, principal component analysis
- Gold standard for research; expensive and time-consuming for commercial operations
CATA (Check-All-That-Apply)
A rapid, lower-cost profiling method. Panelists check all descriptors from a predefined list that apply to the sample. More practical than full descriptive analysis for commercial use.
Setting up a CATA test:
- Use the Coffee Taster’s Flavor Wheel nine primary categories as the starting attribute list, or a subset appropriate to the coffees being evaluated
- Each panelist checks all applicable terms independently
- Statistical analysis: chi-square test on frequency of each attribute; correspondence analysis builds “flavor maps”
- CATA data can be overlaid with affective data (hedonic scores) to connect descriptors to liking
Descriptive cupping (Fernández-Alduenda rapid method): Combines trained cuppers with the SCA Cupping (Cata) protocol and CATA descriptors. Cuppers use structured SCA scoring plus CATA attribute notes. Dramatically cheaper than a full sensory panel while still providing usable descriptive flavor data. The minimum panel size is 6 cuppers.
Applying These Methods at Kaiserblick
| Use case | Recommended method |
|---|---|
| Q Grader certification training | Triangulation, controlled difficulty level |
| Pre-shipment vs. arrival comparison | 3-AFC (specific attributes) or triangulation |
| Processing experiment (fermentation variable) | Triangulation + 3-AFC for acidity/body |
| Lot description for export customers | Descriptive cupping (CATA) using Flavor Wheel |
| Café customer preference research | 9-point hedonic scale + JAR for brew strength |
| Roast profile development | Descriptive cupping + hedonic scale |