Knowledge Brief – Introduction to Correlations and Regressions
When we talk about statistical analysis correlation and regression are the two major terms that we have to understand in order to comprehend what the information is telling us. From an initial attempt these terms might seem complicated to understand so below I try to breakdown what these two terms mean and entail in a very broad level, but enough to get you to understand what they are and how they differ.
Correlations – degree and type of relationship between any two or more variables in which they vary together over a period; A positive correlation exists where the high values of one variable are associated with the high values of the other variable(s). A ‘negative correlation’ means association of high values of one with the low values of the other(s).
Correlations can vary from +1 to -1. Values close to +1 indicate a high-degree of positive correlation, and values close to -1 indicate a high degree of negative correlation.
For the purpose of visualizing what different correlation values look like in a scatterplot, I have including an image below showing examples of different correlational coefficients (values).
Regressions – statistical process estimating the relationship among variables. More specifically, regression analysis helps us understand how a dependent variable changes when any independent variable is varied. Most common, regression analysis estimates the expectation of the dependent variable given the independent variable.
What is R-squared?
R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determinations for multiple regressions. The definition of R-squared is fairly straightforward; it is the percentage of the response variable variation that is explained by a linear model.
In other words, R-squared is always between 0 and 100% [0 and 1]:
- 0%  indicates that the model explains none of the variability of the response data around its mean
- 100%  indicates that the model explains all the variability of the response data around its mean
What is the difference between Correlations and Regressions?
The difference between correlation and regression is that we can only get an index describing the linear relationship between two variables with correlation. A regression can help us predict the relationship between two or more variables and we can use it to identify which variables (x) can predict the outcome of variable (y).