Menu
numiqo
Statistics made easy

Statistics made easy

8th revised edition (March 2026) - many illustrative examples - only €8.99

Free sample

Logistic Regression

Marketing example data Medical example data

What is a Logistic Regression Analysis?

Logistic regression is a special case of regression analysis and is used when the dependent variable is nominally scaled. This is the case, for example, with a purchase decision that has two values: buys a product and does not buy a product.

Logistic regression analysis is the counterpart of linear regression, in which the dependent variable of the regression model must at least be interval-scaled.

With logistic regression, it is possible to explain the dependent variable or estimate the probability of the categories of the variable.

Business example:

For an online retailer, you need to predict which product a particular customer is most likely to buy. For this, you receive a data set with past visitors and their purchases.

Medical example:

You want to investigate whether a person is susceptible to a certain disease or not. For this purpose, you receive a data set with diseased and non-diseased persons as well as other medical parameters.

Political example:

Would a person vote for party A if there were elections next weekend?

If you need to calculate a logistic regression, you can easily use the Regression Analysis calculator here on numiqo.com.

Goals of a Logistic Regression Analysis

In the basic form of logistic regression, dichotomous variables (0 or 1) are predicted. The model estimates the probability of value 1 (characteristic present).

Logistic regression and dichotomous variables

In medicine, for example, a frequent application is to find out which variables have an influence on a disease. In this case, 0 could stand for not diseased and 1 for diseased. Subsequently, the influence of age, gender and smoking status (smoker or not) on this particular disease could be examined.

Logistic Regression Example

Logistic Regression and Probabilities

In linear regression, the independent variables (e.g., age and gender) are used to estimate the specific value of the dependent variable (e.g., body weight).

In logistic regression, on the other hand, the dependent variable is dichotomous (0 or 1) and the probability that outcome 1 occurs is estimated. Returning to the example above, this means: How likely is disease given a person's age, sex, and smoking status?

Calculate Logistic Regression

To build a logistic regression model, the linear regression equation is used as the starting point.

linear regression equation

If you simply apply linear regression to a logistic problem, the result looks like this:

Derivation of the Logistic Regression

As can be seen in the graph, predictions can take values between plus and minus infinity. Logistic regression, however, estimates probabilities rather than raw values. Therefore, the equation must be transformed.

To do this, it is necessary to restrict predicted values to the range between 0 and 1. To ensure that only values between 0 and 1 are possible, the logistic function f is used.

Logistic function

The logistic model is based on the logistic function. The special thing about the logistic function is that for values between minus and plus infinity, it always returns values between 0 and 1.

Logistic function

The logistic function is well suited to describe the probability P(y=1). Applying the logistic function to the linear predictor yields:

Logistic regression Probability

This ensures that, no matter where the x values lie, the predictions stay between 0 and 1. The new graph now looks like this:

Logistic regression

The probability that the dichotomous dependent variable y equals 1 for given predictor values is given by:

Logistic regression Probability

To calculate disease probability in the example above, the model parameters b1, b2, b3 and a must first be determined. Once these have been determined, the equation for the example above is:

Logistic regression analysis

Maximum Likelihood Method

To determine the model parameters for the logistic regression equation, the Maximum Likelihood Method is applied. The maximum likelihood method is one of several methods used in statistics to estimate the parameters of a mathematical model. Another well-known estimator is the least squares method, which is used in linear regression.

The Likelihood Function

To understand the maximum likelihood method, we introduce the likelihood function L. L is a function of the unknown parameters in the model, in case of logistic regression these are b1,... bn, a. Therefore we can also write L(b1,... bn, a) or L(θ) if the parameters are summarized in θ.

L(θ) indicates how likely it is that the observed data occur. As θ changes, the probability of the observed data changes.

Maximum Likelihood Estimation

Maximum Likelihood Estimator

The Maximum Likelihood Estimator can be applied to the estimation of complex nonlinear as well as linear models. In logistic regression, the goal is to estimate the parameters b1,... bn, a, which maximize the so-called log likelihood function LL(θ). The log likelihood function is simply the logarithm of L(θ).

For this nonlinear optimization, different algorithms have been established over the years such as, for example, the Stochastic Gradient Descent.

Multinomial logistic regression

As long as the dependent variable has two characteristics (e.g. male, female), i.e. is dichotomous, binary logistic regression is used. However, if the dependent variable has more than two instances, e.g. which mobility concept describes a person's journey to work (car, public transport, bicycle), multinomial logistic regression must be used.

Each expression of the mobility variable (car, public transport, bicycle) is transformed into a new variable. The one variable mobility concept becomes the three new variables:

  • car is used
  • public transport is used
  • bicycle is used

Each of these new variables then only has the two categories yes or no, e.g. the variable car is used only has the two answer options yes or no (either it is used or not). Thus, for the one variable "mobility concept" with three values, there are three new variables with two values each: yes and no (0 and 1). Three logistic regression models are now created for these three variables.

Interpretation of the results

The relationship between dependent and independent variables in logistic regression is not linear, hence the regression coefficients cannot be interpreted in the same way. For this reason, odds ratios are interpreted in logistic regression.

Linear regression:

An independent variable is considered "good" if it correlates strongly with the dependent variable.

Logistic regression:

An independent variable is said to be "good" if it allows the groups of the dependent variable to be distinguished significantly from each other.

Odds Ratios

An odds ratio (OR) is a statistical measure used to determine the strength of association or effect size between two events or groups, often in case-control studies. It compares the odds of an event occurring in one group to the odds of it occurring in another group.

Odds represent the ratio of the probability of an event happening to it not happening. Odds ratio is the ratio of these odds between two groups.

Pseudo-R squared

In linear regression, the coefficient of determination (R2) indicates the proportion of explained variance.
  • In logistic regression, the dependent variable is nominal or ordinal, so variance in the usual sense is not defined. Therefore, the classical R2 cannot be calculated.
  • To assess the quality of a logistic regression model, so-called pseudo coefficients of determination are used (also called pseudo R2).
  • These pseudo R2 measures are constructed so that they lie between 0 and 1, similar to the original R2.
  • The best-known pseudo R2 measures are:
    • Cox and Snell R2
    • Nagelkerke R2

Null Model

For the calculation of the Cox and Snell R-square and the Nagelkerke R-square, the likelihood from the so-called null model L0 and the likelihood L1 from the calculated model (full model) is needed.

The null model is a model in which no independent variables are included; L1 is the likelihood of the model with the independent variables.

Cox and Snell R-square

In the Cox and Snell R-square, the ratio of the likelihood function of the null model L0 and L1 is compared. The better the full model fits compared to the null model, the lower the ratio between L0 and L1. The Cox and Snell R-square is obtained with:

Cox and Snell R-square

Nagelkerkes R-square

The Cox and Snell pseudo-determination measure cannot become 1 even with a model with a perfect prediction, this is corrected with the R-square of Nagelkerkes. The Nagelkerkes pseudo coefficient of determination becomes 1 if the model being fitted gives a perfect prediction with a probability of 1.

Nagelkerkes R-square

McFadden's R-square

The McFadden's R-square also uses the null model and the model being fitted to calculate the R2.

McFadden's R-square

Chi2 Test and Logistic Regression

In the case of logistic regression, the Chi-square test tells whether the model is overall significant or not.

Chi-square test and logistic regression

Here two models are compared. In one model all independent variables are used and in the other model the independent variables are not used.

Chi-2 Test and Logistic Regression Interpretation

Now the Chi-square test compares how good the prediction is when the dependent variables are used and how good it is when the dependent variables are not used.

The Chi-square test now tells us if there is a significant difference between these two results. The null hypothesis is that both models are the same. If the p-value is less than 0.05, this null hypothesis is rejected.

Example logistic regression

As an example for the logistic regression, the purchasing behaviour in an online shop is examined. The aim is to determine the influencing factors that lead a person to buy immediately, at a later time or not at all from the online shop after visiting the website. The online shop provides the data collected for this purpose. The dependent variable therefore has the following three categories:

  • Buy now
  • Buy later
  • Don't buy

Gender, age and time spent in the online shop are available as independent variables.

Load this data set and try it out
Purchasing behaviour Gender Age Time spent in online shop
Buy now female 22 40
Buy now female 25 78
Buy now male 18 65
... ... ... ...
Buy later female 27 28
Buy later female 27 15
Buy later male 48 110
... ... ... ...
Don't buy female 33 65
Don't buy female 43 34

Logistic regression results

Logistic regressions, similar to linear regression models, can be easily and quickly calculated with numiqo.

To recalculate the example above:

  • Copy and paste the table on purchasing behavior in the online store into numiqo’s statistics calculator.
  • Select the Regression tab.
  • Click on the desired variables.

The results are displayed directly below in table form.

Logistic regression Result Presentation

Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 464 pages
  • 8th revised edition (March 2026)

Only €8.99

Free sample
numiqo

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Cite numiqo: numiqo Team (2026). numiqo: Online Statistics Calculator. numiqo e.U. Graz, Austria. URL https://numiqo.com

Contact & About Us FAQ Privacy Policy Terms and Conditions Statistics Software Minitab alternative Minitab to Excel (Minitab File Converter) SPSS to Excel (SPSS File Converter) SPSS alternative DATAtab is now numiqo