Linear Correlation between Two Variables
- Python Automation and Machine Learning for ICs -
- An Online Book -

 Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix
http://www.globalsino.com/ICs/

=================================================================================

There are several statistical models and methods that can be used to study the linear correlation between two variables. The most common approach is to calculate the Pearson correlation coefficient, but there are other methods as well.

Table 3705. Linear correlation between two variables.

Summary Type of Relationship Scale Assumptions Strengths Limitations
Pearson Correlation Coefficient (r) Measures the linear relationship between two variables.
Values range from -1 to 1, where -1 indicates a perfect negative linear relationship, 1 indicates a perfect positive linear relationship, and 0 indicates no linear correlation.
Measures linear relationships between variables. Values range from -1 to 1. Assumes a linear relationship and normal distribution of variables. Well-suited for linear relationships; easy to interpret. Sensitive to outliers (see page3920); may not capture non-linear relationships.
Spearman Rank Correlation Coefficient (ρ or rs) Measures the strength and direction of the monotonic relationship between two variables.
Based on the ranks of the data rather than the actual values.
Suitable for ordinal or non-normally distributed data.
Measures monotonic relationships (linear or non-linear) Measures monotonic relationships (linear or non-linear) Does not assume normal distribution; suitable for ordinal data. Robust to outliers; suitable for non-linear relationships. Ignores magnitude of differences; may lose information in tied ranks.
Kendall's Tau (τ) Measures the strength and direction of the ordinal association between two measured quantities.
Like Spearman correlation, it is based on the ranks of the data.
Measures ordinal relationships. No fixed scale; based on concordant and discordant pairs. Does not assume normal distribution; suitable for ordinal data. Robust to outliers; suitable for non-linear relationships. Computationally intensive for large datasets.
Linear Regression Represents the linear relationship between two variables using a regression equation.
Helps in predicting the value of one variable based on the value of another.
The slope of the regression line provides information about the strength and direction of the linear relationship.
Models a linear relationship with a regression equation. Predicts values on a continuous scale. Assumes a linear relationship, normal distribution of errors, and homoscedasticity. Provides a predictive model; assesses the impact of independent variables. Sensitive to outliers; assumes linearity.
Coefficient of Determination (R-squared) In linear regression, R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
R-squared values range from 0 to 1, where 1 indicates a perfect fit.
Measures the proportion of variance explained by the linear regression model. Values range from 0 to 1. Same as linear regression. Same as linear regression. Does not reveal the quality of individual predictors; may be misused.
Correlation Ratio (Eta or η) Measures the strength and direction of association between two categorical variables.
Particularly useful when dealing with nominal or ordinal data.
Measures association between categorical variables. alues range from 0 to 1. Suitable for nominal or ordinal data; does not assume linearity. Suitable for categorical data; easy to interpret. Limited to categorical variables; may not capture complex relationships.

 =================================================================================