Pearson Correlation

Six Sigma Lean Manufacturing Continuous Improvement

Pearson Correlation, otherwise known as correlation or degree of dependence, is a statistical concept used to measure the relationship between two or more random variables or experimental data values. That is, the correlation between two variables reflects the degree to which the variables are related to each other.

For a much deeper understanding of Pearson Correlation, a correlation can sometimes be defined as a number between -1 and +1 that measures the degree of association between two variables (e.g. x and y). In the mathematical formula shown below, if after inputting all the values of the given variables and data, a positive value for the correlation means a positive association between x and y while a negative value for the correlation implies a negative or inverse association. However, for positive correlation, do bear in mind that large values of x tend to be associated with large values of y and small values of x tend to be associated with small values of y. On the other hand, in a negative correlation, large values of x tend to be related to small values of y and vice versa.

Mathematical Representation of Pearson Correlation

The mathematical formula for Pearson Correlation is given as shown below:
r_{xy}=frac{sum_{i=1}^n (x_i-bar{x})(y_i-bar{y})}{(n-1) s_x s_y}

Where x and y are two variables or the sample means of X and Y and
sx and sy are the sample standard deviations of X and Y.

Negative Pearson Correlation

So, how do you know if a value for a set of data is negative as far as Pearson correlation is concerned? Consider the extract from the main Pearson Correlation formula below. Assuming the value of the x variable at a certain point in time is below average and that of the y variable is above; upon multiplication, the resulting product will be a negative correlation for the top half of the formula, thereby resulting in a negative correlation.

r_{xy}=frac{sum_{i=1}^n (x_i-bar{x})(y_i-bar{y})}{(n-1) s_x s_y}

Furthermore, the same applies if the value of the x variable at a certain point in time is above average and that of the y variable is below average. Therefore, the statements made earlier can be said to be true; that is “a negative correlation is evidence of a general tendency that large values of x are associated with small values of y and small values of x are associated with large values of y.

Positive Pearson Correlation

On the other hand, for a correlation value to be positive, consider the same formula as that used earlier. If the value of the x variable at a certain point in time is below average and that of the y is also below average; then upon multiplication you’ll have the product of 2 negative values. Now, the resulting value will obviously be a positive for the top half of the formula, thereby resulting in a positive Pearson correlation.

r_{xy}=frac{sum_{i=1}^n (x_i-bar{x})(y_i-bar{y})}{(n-1) s_x s_y}

In addition to that, the same applies if the value of the x variable at a certain point in time is above average and that of the y variable is above average. What you have is 2 positive values multiplying each other; and of course the answer will remain positive.