Gage R&R
Author: Dr. Hannah Volk-Jesussek
Updated:
This tutorial deals with Measurement System Analysis, more precisely Gage R&R, as a specific type of Measurement System Analysis.
What is a Gage?
Let’s say you work for a company that produces shafts. Shafts are round metal rods with
a certain diameter. Now a gage can be as simple as a digital caliper for measuring a
shaft’s diameter. So a gage is any device or setup used to measure a characteristic of a
part or process.
Now what about the R&R? R&R stands for Repeatability and Reproducibility.
What is R&R?
R&R stands for Repeatability and Reproducibility.
When we measure something in practice, we will get values that fluctuate to some extent. Even if the same person measures the same part 10 times, there will likely be some variation.
The repeatability and reproducibility now tell us, how much variation comes from the tool itself and how much variation comes from different people using the tool.
What is Repeatability?
Let's take a deeper look at repeatability first. So repeatability is the variation when the same person measures the same part with the same gage multiple times. This is also called equipment variation.
But be careful: Let’s say these are 10 measurements taken by the same person, on the same part, with the same gage. So same person measures the same part 10 times. In theory, all the variation should come from the gage.
However, if for example the shaft isn’t perfectly round and we measure at slightly different locations each time, then—even though it’s the same part— we get different measurements. So, some of the observed variation will come from the part itself.
Repeatability tells us how much variation we see when the same person measures the same part with the same gage, multiple times. And of course, we want that our measurement system is consistent when the same operator measures with the same gage the same part repeatedly.
What is Reproducibility?
Reproducibility gives us the variation when different people (or setups) measure the same part with the same gage. And of course, we want different operators measuring the same part with the same gage to obtain similar results.
One thing to notice: In everyday language gage usually means just the instrument, so the digital caliper.
But in the context of Measurement system analysis when we say gage, we mean the entire measurement setup, so the digital caliper, the method, the environment, and the people using it.
For example, the method includes measurement steps like zeroing the caliper, cleaning the part, measuring in the middle, and recording the result.
Why do we need a Measurement System Analysis?
Of course, your company needs to run profitably. That means your measurement system must reliably call good parts ‘good’ and bad parts ‘bad.’
So there are two costly mistakes to avoid:
- (1) Calling good parts bad. Which means you scrap or rework perfectly fine parts, which is pure waste.
- (2) Calling bad parts good. You ship defects to customers. This risks returns and complaints, and, in the worst case, failures in the field.
A solid measurement system minimizes both mistakes, protects your margin, and protects your customer. Now we want to know how to calculate a measurement system analysis and how to interpret the results.
How do we calculate a Measurement System Analysis?
In order to do this, we first need data. Let’s say we have two operators, operator 1 and operator 2 and three parts, part one, two and three.
So in total each part is measured twice by each operator. For example, part one is measured twice by operator 1 and twice by operator 2. But be careful, in practice you will certainly use more than three parts, and later we will also use a larger dataset.
And one important point: In a Gage R&R you should choose parts that span the full range of measurements you expect in production. This is important in order to test whether the measurement system can clearly distinguish different parts across the whole process.
Basically there are two common ways to calculate a Gage R&R: (1) with the Range Method and with (2) an ANOVA. ANOVA is the preferred approach in modern software and guidance. So first, we will quickly discuss the ANOVA. After that, we’ll walk through the Gage R&R interpretation.
What is an ANOVA?
An ANOVA tests whether there are statistically significant differences between three or more groups. More precisely, it is tested whether there is a significant difference between the mean values of the groups.
For example, if one group contains all measurements from operator 1 and another group contains measurements from operator 2, we can test whether there is a significant difference between the measurements. Likewise, if groups are defined by different parts, we can test whether the measurements differ significantly across parts.
How can we calculate a Gage R&R with numiqo?
First we go to the Gage R&R calculator on numiqo. If you like you can load this dataset. Or you can copy your own data into this table.
We just click on measurement system analysis, here we can see the variable from the table above. Then we just select Measurement, Part and operator.
Numiqo directly gives a Gage R&R. If you like, you can define a tolerance let’s say it is 1.5. The tolerance comes from engineering or functional requirements, not from the Gage R&R data. We will now take a detailled look at the ANOVA table and the most important table the variance components table.
How can we interpret the ANOVA table?
Let‘s start with the ANOVA table. With the help of this table we can answer three main questions: Is the Part-to-Part Variation significant, is the Operator Variation significant and what about the Repeatability. So, how do we answer these Questions?
If you want to get a detailled explanation of ANOVA, please take a look at our tutorials or training videos.
But to keep it short for now: we’re mainly interested in the p-value in the last column. If the p-value is less than 0.05 (5%), that factor is considered significant.
So let's have a look at the factor parts. A p-value of less than 0.001 means the parts are significantly different from each other. This is what we want — the measurement system can detect real part differences.
And what about the factor operator? With a p-value of 0.002 there is a significant difference between the measurements of the operators. This means operator 1 and operator 2 do not measure the same part the same way. So there is a reproducibility issue. Because the measurement depends on who measures the part. That should not occur in a good system.
But be aware, in ANOVA, even small differences can become statistically significant when the data are very consistent (low noise). That’s why we’ll later on use the variance components to judge how big the effect is in practice.
What is the conclusion of the MSA results?
Of course, this is just an example, but an overall conclusion could be that the measurement system can detect differences between parts (which is GOOD). Operators do not measure consistently with each other (which is BAD → needs correction).
So recommended actions could be: Create a clear work instruction and train
operators on the same measurement method.
But be aware that more detailed conclusions will be gained with the table of the variance components.
On ething to note: There may be an interaction between operator and part, in which case we would have another row in the ANOVA table.
In this example, however, we get the following: There is no significant interaction between operator and part, and two-way analysis of variance without interaction is used. So because the interaction is not significant, it is not used in the model.
How can we interpret variance components table?
Let‘s now take a look at the next table with the variance components.
We read this table from left to right. First, the variance column. This shows how much variation each source adds. In our case, Total Variation is about 0.06, and Total Gage R&R is about 0.01, so the measurement system contributes a small slice of the overall variation.
Next, % Contribution. This tells us what share of the total variance comes from each source. About 85% comes from the parts themselves, which means the parts are truly different — that’s a good sign. However, our Total Gage R&R makes up the remaining 15%, which is quite a lot.
Repeatability is the variation when the same operator measures the same part again and again. It’s about 9%. Reproducibility is the variation between different operators. It’s about 6%.
The standard deviation (StdDev) is just the square root of variance.
Then % Study Var compares each source’s standard deviation to the total standard deviation. This is what many companies use for accept/reject decisions.
For example, here is the Automotive Industry Action Group (AIAG) acceptance criteria for % Study Var.
- Less than 10% means the measurement system is acceptable
- Between 10% and 30% means it is conditionally acceptable. Depends on application, gage cost, and cost of rework/repair; customer approval often required.
- Above 30% means unacceptable; the measurement system should be improved.
So according to the AIAG criteria our measurement system would be "unacceptable", because our value is above 30%.
Finally, we have %Tolerance. At the beginning, we entered a tolerance of 1.5.
The tolerance is the upper specification limit minus the lower specification limit.
When you enter a tolerance, all the other columns stay exactly the same; only the %Tolerance column uses that value.
If you leave tolerance blank, this last column simply won’t appear. But, including tolerance is helpful because it tells you how much of the spec window is being consumed by measurement variation.
For example, you might see a high percentage in %Study Var, but if your tolerance is large, the %Tolerance can still be small.This means that the gage error is small relative to your specification limits.
If we set the tolerance to 4.0, the %Tolerance drops, because the gage variation occupies a smaller fraction of the tolerance window.
What is the number of distinct categories (ndc)?
The number of distinct categories tells you how many clearly different part levels your measurement system can distinguish. A simple way to think about it:
- ndc 1 to 2: the gage can barely distinguish parts
- ndc 3 to 4: okay for rough screening
- ndc 5 or more: generally good; you can see real differences
- ndc 10 or more: very good resolution
In short we can say: ndc is just a quick score for how much detail your measurement system can see.
numiqo is therefore a viable alternative to Minitab — feel free to give it a try!
Statistics made easy
- many illustrative examples
- ideal for exams and theses
- statistics made easy on 454 pages
- 6th revised edition (March 2025)
- Only 8.99 €
"Super simple written"
"It could not be simpler"
"So many helpful examples"