Menu
numiqo
Statistics made easy

Statistics made easy

8th revised edition (March 2026) - many illustrative examples - only €8.99

Free sample

Regression Analysis

Author: Dr. Hannah Volk-Jesussek
Updated:

Medical example data Agriculture example data

What is a Regression?

Regression is a statistical method for analyzing relationships between variables. It allows you to explain or predict one variable based on others.

It helps you answer questions like:

  • Does X affect Y?
  • How strong is that effect?
  • Can I predict Y if I know X?

Basic idea of a Regression Analysis

  • Dependent variable (Y): what you’re trying to explain or predict
  • Independent variable(s) (X): what you think influences Y

In this example, we want to know what influences a person's salary (dependent variable). To explore this, we consider the variables education level, weekly working hours and age (independent variables).

We can now investigate whether these three variables influence a person's salary. If they do, you can use a person's highest education level, weekly working hours and the age to predict their salary.

Regression

Note: The variable to be inferred (dependent variable) is also called criterion. The variables used for prediction (independent variables) are also called predictors.

Goals of Regression Analysis

A regression analysis typically serves two main purposes:

  1. Measuring the influence of one or more variables on another variable
  2. Predicting the value of a variable based on one or more other variables

1) Measurement of the influence of one or more variables on another variable

  • What influences children's ability to concentrate?
  • Do parents’ educational levels and place of residence affect children’s future educational attainment?

2) Prediction of a variable by one or more other variables

  • How long does a patient stay in the hospital?
  • What product is a person most likely to buy from an online store?

Regression analysis therefore shows how the dependent variable changes when one or more independent variables change.

Types of Regression Analysis

Regression analyses can be divided into simple linear regression, multiple linear regression and logistic regression. The appropriate type of regression analysis depends on the number of independent variables and the scale of measurement of the dependent variable.

Number of independent variables Scale of dependent variable Scale of independent variable
Simple linear Regression One Metric Metric, ordinal, nominal
Multiple linear Regression Multiple Metric Metric, ordinal, nominal
Logistic Regression Multiple Ordinal, nominal Metric, ordinal, nominal

If you only want to use one variable for prediction, simple regression is applied. When more than one independent variable is used, multiple linear regression is required.

If the dependent variable is nominally or ordinally scaled, logistic regression must be used. If the dependent variable is metrically scaled, linear regression is appropriate.

Whether a linear or non-linear regression model is used depends on the nature of the relationship between the variables. To perform a linear regression, a linear relationship between the independent variables and the dependent variable is necessary.

Regardless of the type of regression, independent variables may be metric, ordinal, or nominal. However, if an ordinal or nominal independent variable has more than two categories, dummy variables must be created.

Examples of Regression

Simple linear regression

Does the weekly working time have an influence on the hourly wage of employees?

Multiple linear regression

Do the weekly working time and the age of employees have an influence on their hourly wage?

Logistic regression

Do the weekly working time and the age of employees have an influence on the probability that they are at risk of burnout?

Dependent variable
Independent variables

Regression analysis

Dummy Variables and Reference Category

When an independent variable is categorical, it is encoded as a set of binary dummy variables before being included in the regression model.

Creating dummy variables means converting a categorical variable with multiple categories into several binary variables.

This involves the following steps:

  1. A categorical variable with multiple categories is transformed into several binary variables.
  2. One category is selected as the reference category.
  3. A new dummy variable is created for each of the remaining categories.
  4. Each dummy variable takes the value 1 if the observation belongs to that category and 0 otherwise.

In our example, we want to study the effect of education level (high school, college, graduate) on salary:

  • Dependent variable: salary
  • Independent variable: education level, which has three categories
  • Reference category: high school
  • Dummy variables created:
    • is_college
    • is_graduate
  • Example: is_college = 1 if the individual has a college degree, and 0 if not.

Control Variables (Covariates)

In regression analysis, a control variable (covariate) is an additional independent variable that is included in the model to account for potential confounding factors.

The main purpose of including control variables is to isolate the relationship of interest between the main independent variable(s) and the dependent variable, ensuring that the observed relationship is not influenced by unobserved or omitted factors.

Inclusion of control variables can help in several ways:

  1. Reducing omitted variable bias: If there's a variable that affects both the dependent variable and one of the independent variables and it's not included in the model, the coefficient on the independent variable could be biased. Including the control variable helps to reduce or eliminate this bias.
  2. Increasing precision: Controlling for additional sources of variability can reduce the residual variance, leading to more precise estimates.
  3. Accounting for confounding: In many cases, the relationship between two variables might be spurious because of a third variable that influences both. Including this third variable as a control can help reveal the true relationship.

Example

Suppose you are studying the effect of exercise on weight loss.

  • Independent variable: exercise
  • Dependent variable: weight loss
  • Potential Control variable:
    • Age may influence weight loss due to age-related changes in metabolism.
    • Age may be related to exercise behavior, as younger individuals often exercise more.

Age not included in the model:

  • The effect of exercise on weight loss may be overstated.
  • Part of the observed effect may actually be due to age rather than exercise.

Age included in the model:

  • The specific effect of exercise on weight loss can be more accurately isolated.

Considerations

When selecting control variables, the following points should be considered:

  • Including irrelevant control variables can unnecessarily complicate the model.
  • Too many controls may reduce the statistical power of the analysis.
  • Omitting important control variables can lead to biased estimates.
  • The choice of control variables should be guided by theoretical reasoning and empirical diagnostic tests.

Causality in Regression

In the case of linear regression, the independent variable can be used to predict the dependent variable if there is a correlation between the two variables . However, what is important to note is that a correlation between two variables does not necessarily mean causality.

This means the following:

  • A correlation indicates that two variables tend to change together.
  • If high values of one variable are associated with high values of another variable, this reflects a statistical relationship, not necessarily a causal one.
  • An increase in one variable does not automatically mean it causes an increase in the other variable.

The observed relationship may be influenced by:

  • A third variable (covariate)
  • Reverse causality
  • Pure coincidence

Therefore:

  • Linear regression can be used to describe and predict relationships between variables.
  • Causal conclusions require theoretical justification (e.g. you focus on a special theory or model), or experimental evidence(e.g. experimnts that alreday tested the relationship).

Calculate a Regression with numiqo

Only three simple steps are needed, and the regression calculator will provide all key figures:

  • 1. Copy your data into the table of the statistics calculator
  • 2. Click on Regression
  • 3. Select a dependent variable and one or more independent variables

If one of the independent variables has a categorical level of measurement (ordinal or nominal), dummy variables are automatically generated and a reference category is defined. As soon as a series contains only numbers, the statistics calculator automatically defines it as a metric variable.


Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 464 pages
  • 8th revised edition (March 2026)

Only €8.99

Free sample
numiqo

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Cite numiqo: numiqo Team (2026). numiqo: Online Statistics Calculator. numiqo e.U. Graz, Austria. URL https://numiqo.com

Contact & About Us FAQ Privacy Policy Terms and Conditions Statistics Software Minitab alternative Minitab to Excel (Minitab File Converter) SPSS to Excel (SPSS File Converter) SPSS alternative DATAtab is now numiqo