Sensory Testing Methods

Sources: Coffee Sensory and Cupping Handbook by Fernández-Alduenda & Giuliano (SCA, 2021)


Sensory science uses three distinct categories of tests, each answering a different question. They should not be mixed in a single session, as they activate different cognitive modes in the taster.

CategoryQuestionWho answers itExample tools
Difference testing”Are these two coffees detectably different?”Trained tasters or cuppersTriangulation, 3-AFC
Affective testing”How much do people like this?”Consumers or cuppers9-point hedonic scale, JAR scales
Descriptive analysis”What are this coffee’s sensory characteristics?”Trained descriptive panelCATA, full descriptive panel

Difference Testing

Purpose

To determine whether a sensory difference exists between two (or more) coffee samples. Does not identify what is different or whether the difference is desirable — only that it is detectable.

Triangulation test

The most widely used difference test in coffee. A taster is presented with three cups — two of the same coffee (A, A) and one different (B) — and asked to identify the “odd” cup.

Setting up a triangulation:

  • Pre-grind samples from the same batch brew to ensure the “same” samples are truly the same
  • Serve at the same temperature; use identical cups; control visual cues (cup crust, color)
  • Randomize the position of the odd cup (AAB, ABA, BAA, BBA, BAB, ABB all used equally)
  • Minimum 6 tasters per session for reliable results

Interpreting triangulation results: The δ’ (delta-prime) statistic measures the sensory magnitude of the difference between two coffees. At δ’ = 1.75, about 10/18 tasters pass; at δ’ = 2.98, about 14/18 pass.

Controlling difficulty (δ’): The level of difficulty of a triangulation can be deliberately adjusted by blending two very different coffees (C. arabica and C. canephora) at different ratios. A 20:80 arabica:canephora blend vs. pure arabica creates a very high difficulty level (low δ’); a 5:95 blend vs. pure arabica creates a lower difficulty level (higher δ’).

Applications:

  • Q Grader exams: candidates must pass 5/6 triangulations at a given difficulty level
  • Taster training: assess and develop individual sensory discrimination ability
  • Processing experiments: test whether a change in fermentation time, drying method, etc. produces a detectable difference

3-AFC (Alternative Forced Choice)

Much more powerful than triangulation for detecting differences in a specific attribute. A taster is given three cups and asked to identify which has the highest intensity of a named attribute (e.g., “Which cup has the highest acidity?”).

3-AFC is approximately twice as powerful as triangulation for the same attribute and sample — meaning you need fewer tasters to detect the same level of difference. However, the attribute must be clearly defined and tasters must understand it.

Applications:

  • Research into processing variables (does longer fermentation increase acidity intensity?)
  • Roast profile comparison (does a 1°C temperature change affect body intensity?)

Setting up difference tests well

  • Ensure “same” samples come from the same batch and brew
  • Serve all cups at the same temperature (coldest brew = slightly easier to find differences; hottest = most volatile aromatics)
  • Use black or very dark cups to eliminate visual cues if color of brew is a variable
  • Prohibit communication between tasters until all forms submitted
  • Use paper forms rather than digital during the test (reduces social signaling)

Affective Testing

Purpose

To measure how much tasters like a product, or which of two products they prefer. Affective tests measure subjective experience — the whole point is to capture individual variation, not suppress it.

9-point hedonic scale

The gold standard for affective food testing. Developed by the US Armed Forces in the 1940s to measure soldiers’ food preferences. Terms:

ScoreLabel
9Like extremely
8Like very much
7Like moderately
6Like slightly
5Neither like nor dislike
4Dislike slightly
3Dislike moderately
2Dislike very much
1Dislike extremely

The scale has equal psychological distance between each level. It has been validated across thousands of studies and numerous product categories. It can be used with cartoon faces instead of words for children or to reduce language barriers.

Limitation: does not tell you why people like or dislike — only how much. Must be combined with descriptive data to be actionable.

JAR (Just About Right) scales

Measures the appropriateness of a specific attribute. Example for “acidity”:

  • Much too low
  • A little too low
  • Just about right
  • A little too high
  • Much too high

Useful for diagnosing specific consumer complaints and optimizing recipes (e.g., finding the ideal brew strength for a café’s house blend).

Preference mapping

Combines hedonic data with descriptive data to build a “map” of consumer preferences. Uses hierarchical cluster analysis and principal component analysis (PCA) to:

  1. Segment consumers by shared preference patterns
  2. Identify which sensory attributes drive liking in each segment
  3. Map coffee products to consumer segments

Example finding relevant to Kaiserblick: Research on brewing preferences across a student population found two consumer clusters: “strong coffee likers” (TDS ~1.5%, driven by nutty, roasted, dark chocolate attributes) and “weak coffee likers” (TDS ~1.0%, driven by tea/floral, sweet, cereal attributes). Kaiserblick’s specialty light roast targets the second cluster, who are underserved by mainstream commodity coffee.


Descriptive Analysis

Purpose

To objectively quantify the sensory attributes of a coffee. Unlike affective testing, descriptive analysis deliberately suppresses preference and focuses on neutral, accurate characterization. Output can be correlated with processing variables, chemical composition, cupping scores, or consumer preference data.

Full descriptive panel

  • 8–12 trained panelists with calibrated sensory references
  • Panelists assess attribute intensities on unstructured 15cm line scales (no numbers; tick marks only)
  • Data analyzed by ANOVA, spiderweb plots, principal component analysis
  • Gold standard for research; expensive and time-consuming for commercial operations

CATA (Check-All-That-Apply)

A rapid, lower-cost profiling method. Panelists check all descriptors from a predefined list that apply to the sample. More practical than full descriptive analysis for commercial use.

Setting up a CATA test:

  1. Use the Coffee Taster’s Flavor Wheel nine primary categories as the starting attribute list, or a subset appropriate to the coffees being evaluated
  2. Each panelist checks all applicable terms independently
  3. Statistical analysis: chi-square test on frequency of each attribute; correspondence analysis builds “flavor maps”
  4. CATA data can be overlaid with affective data (hedonic scores) to connect descriptors to liking

Descriptive cupping (Fernández-Alduenda rapid method): Combines trained cuppers with the SCA Cupping (Cata) protocol and CATA descriptors. Cuppers use structured SCA scoring plus CATA attribute notes. Dramatically cheaper than a full sensory panel while still providing usable descriptive flavor data. The minimum panel size is 6 cuppers.


Applying These Methods at Kaiserblick

Use caseRecommended method
Q Grader certification trainingTriangulation, controlled difficulty level
Pre-shipment vs. arrival comparison3-AFC (specific attributes) or triangulation
Processing experiment (fermentation variable)Triangulation + 3-AFC for acidity/body
Lot description for export customersDescriptive cupping (CATA) using Flavor Wheel
Café customer preference research9-point hedonic scale + JAR for brew strength
Roast profile developmentDescriptive cupping + hedonic scale