Pearson Correlation
Medical example data Marketing example dataWhat is Pearson correlation?
Pearson correlation analysis examines the relationship between two variables. For example, is there a correlation between a person's age and salary?
More specifically, we can use the Pearson correlation coefficient to measure the linear relationship between two variables.
Strength and direction of correlation
With a correlation analysis we can determine:
- How strong the correlation is
- and in which direction the correlation goes.
We can read the strength and direction of the correlation in the Pearson correlation coefficient r, whose value varies between -1 and 1.
Strength of the correlation
The strength of the correlation can be read in a table. An r between 0 and 0.1 indicates no correlation. An absoulte value of r between 0.7 and 1 indicates a very strong correlation.
| Absolute value of r | Strength of correlation |
|---|---|
| 0.0 < 0.1 | no correlation |
| 0.1 < 0.3 | low correlation |
| 0.3 < 0.5 | medium correlation |
| 0.5 < 0.7 | high correlation |
| 0.7 < 1 | very high correlation |
Direction of the correlation
A positive relationship or correlation exists when large values of one variable are associated with large values of the other variable, or when small values of one variable are associated with small values of the other variable.
A positive correlation exists, for example, for height and shoe size. This yields a positive correlation coefficient.
A negative correlation occurs when large values of one variable are associated with small values of the other variable and vice versa.
A negative correlation is usually found between product price and sales volume. This produces a negative correlation coefficient.
Calculate the Pearson correlation coefficient
The Pearson correlation coefficient is calculated using the following equation. Here r is the Pearson correlation coefficient, xi are the individual values of one variable, e.g., age; yi are the individual values of the other variable, e.g., salary; and x̄ and ȳ are the mean values of the two variables, respectively.
In the equation, we can see that the respective mean value is first subtracted from both variables.
So in our example, we calculate the mean values of age and salary. We then subtract the mean values from each of age and salary. We then multiply both values.
Then we sum up the individual results of the multiplication. The expression in the denominator ensures that the correlation coefficient is scaled between -1 and 1.
If we now multiply two positive values we get a positive value. If we multiply two negative values we also get a positive value (minus times minus is plus). So all values that lie in these ranges have a positive influence on the correlation coefficient.
If we multiply a positive value and a negative value we get a negative value (minus times plus is minus). So all values that are in these ranges have a negative influence on the correlation coefficient.
Therefore, if our values are predominantly in the two green areas (quadrants) in the previous two figures, we get a positive correlation coefficient and therefore a positive correlation.
If our scores are predominantly in the two red areasy (quadrants) in the figures, we get a negative correlation coefficient and thus a negative correlation.
If the points are distributed over all four areas (quadrants), the positive terms and the negative terms cancel each other out and we might end up with a very small or no correlation.
Testing correlation coefficients for significance
In general, the correlation coefficient is calculated using data from a sample. In most cases, however, we want to test a hypothesis about the population.
In the case of correlation analysis, we then want to know if there is a correlation in the population.
For this, we test whether the correlation coefficient in the sample is statistically significantly different from zero.
Hypotheses for the Pearson correlation coefficient
The null hypothesis and the alternative hypothesis for the Pearson correlation coefficient are thus:
- Null hypothesis: The correlation coefficient is not significantly different from zero (there is no linear relationship).
- Alternative hypothesis: The correlation coefficient deviates significantly from zero (there is a linear correlation).
Note: It is always tested whether the null hypothesis is rejected or not rejected.
In our example with the salary and the age of a person, we could thus have the question: Is there a correlation between age and salary in the German population (the population)?
To find out, we draw a sample and test whether the correlation coefficient is significantly different from zero in this sample.
- The null hypothesis is then: There is no correlation between salary and age in the German population.
- and the alternative hypothesis: There is a correlation between salary and age in the German population.
Significance and the t-test
Whether the Pearson correlation coefficient is significantly different from zero based on the sample surveyed can be checked using a t-test. Here, r is the correlation coefficient and n is the sample size.
A p-value can then be calculated from the test statistic t. If the p-value is smaller than the specified significance level, which is usually 5%, then the null hypothesis is rejected, otherwise it is not.
Assumptions of the Pearson correlation coefficient
But what about the assumptions for the Pearson correlation? Here we have to distinguish whether we just want to calculate the Pearson correlation coefficient, or whether we want to test a hypothesis.
To calculate the Pearson correlation coefficient, only two metric variables must be present. Metric variables are, for example, a person's weight, a person's salary, or electricity consumption.
The Pearson correlation coefficient then tells us how large the linear relationship is. If there is a non-linear correlation, we cannot read it from the Pearson correlation coefficient.
However, if we want to test whether the Pearson correlation coefficient is significantly different from zero in the sample, i.e. we want to test a hypothesis, the two variables must also be normally distributed!
If this assumption is violated, the calculated test statistic t or the p-value cannot be interpreted reliably. If the assumptions are not met, Spearman's rank correlation can be used.
Calculate the Pearson correlation online with numiqo
You can also calculate a correlation analysis online with numiqo. To do this, simply copy your data into this table in the statistics calculator and click on either the Hypothesis tests or Correlation tab.
If you now look at two metric variables, the Pearson correlation coefficient will be calculated automatically. If you don't know exactly how to interpret the results, you can also just click on Summary in words.
Statistics made easy
- many illustrative examples
- ideal for exams and theses
- statistics made easy on 454 pages
- 6th revised edition (March 2025)
- Only 8.99 €
"Super simple written"
"It could not be simpler"
"So many helpful examples"