Point-Biserial Correlation
Author: Dr. Hannah Volk-Jesussek
Updated:
What is a Point-biserial correlation?
Point-biserial correlation is a special case of Pearson correlation and examines the relationship between a dichotomous variable and a metric variable.
What is a dichotomous variable and what is a metric variable? A dichotomous variable has two categories, for example gender (male/female) or smoking status (smoker/non-smoker). A metric variable could be a person's weight or salary.
If we have a dichotomous variable and a metric variable and want to know if there is a correlation, we can use a point-biserial correlation. We need to check the assumptions first, but more about that later.
Calculate the point-biserial correlation
As noted above, the point-biserial correlation is a special case of Pearson correlation. But how can we calculate Pearson correlation when one variable is nominal? Let's look at an example.
Let's say we want to study the correlation between the number of hours spent studying for an exam and the exam result (pass/fail).
We collected data from 20 students, 12 of whom passed the test and 8 of whom failed. We recorded the number of hours each student studied for the exam.
To calculate the point-biserial correlation, we first convert the test result to numbers. We can assign a value of 1 to students who passed and 0 to students who failed.
Now we can either calculate the Pearson correlation between study time and test result, or use the equation for the point-biserial correlation.
Point-biserial correlation and Pearson correlation
Whether we calculate the Pearson correlation or use the equation for the point-biserial correlation, we get the same result.
Calculation with numiqo
Let's look at this in numiqo. We have learning hours, the pass/fail test result, and the test result coded as 0 and 1. We treat the test result with zero and one as metric.
If we go to correlation and calculate the Pearson correlation for these two metric variables, we get a correlation coefficient of 0.31. If we calculate the point-biserial correlation for learning hours and exam result with "passed" and "failed," we also get a correlation of 0.31.
Point-biserial correlation coefficient
Like the Pearson correlation coefficient r, the point-biserial correlation coefficient rpb also varies between -1 and 1.
If we have a coefficient between -1 and less than 0, there is a negative correlation, that is, a negative relationship between the variables.
If we have a coefficient between greater than 0 and 1, there is a positive correlation, that is, a positive relationship between the two variables. If the result is 0, we have no correlation.
Hypotheses
Often, however, starting from a sample, we want to test a hypothesis about the population. In the case of correlation analysis, we can test whether the correlation coefficient is significantly different from 0.
The hypotheses for the point-biserial correlation are:
- Null hypothesis: The correlation coefficient r = 0 (there is no correlation).
- Alternative hypothesis: The correlation coefficient r ≠ 0 (there is a correlation).
Point-biserial correlation and the t-test for independent samples
When we calculate a point-biserial correlation, we get the same p-value as an independent t-test for the same data.
Whether we test a correlation hypothesis with the point-biserial correlation or a difference hypothesis with the t-test, we get the same p-value.
If we calculate a t-test in numiqo with the data under the tab "Hypothesis Tests", and we have the null hypothesis: "There is no difference between the fail and pass groups with respect to the variable Hours Studied", then we get a p-value of 0.179.
Likewise, if we calculate a point-biserial correlation under the tab "Correlation" and we have the null hypothesis: "There is no correlation between Hours Studied and Test Result", we also get a p-value of 0.179!
In our example, the p-value is greater than 0.05, which is commonly used as a significance level, and thus the null hypothesis is not rejected.
Assumptions for a point-biserial correlation
For point-biserial correlation, we need to distinguish between calculating the correlation coefficient and testing a hypothesis. To calculate the coefficient, we need a metric variable and a dichotomous variable.
However, if we want to test whether the correlation coefficient is significantly different from zero, the metric variable must also be normally distributed! If this assumption is violated, the test statistic t or the p-value cannot be interpreted reliably!
Statistics made easy
- many illustrative examples
- ideal for exams and theses
- statistics made easy on 454 pages
- 6th revised edition (March 2025)
- Only 8.99 €
"Super simple written"
"It could not be simpler"
"So many helpful examples"