
Purposes 
Variables 
Outputs 
Correlation Analysis 
Is used to determine the strength and direction of the linear relationship between two variables. That is, it is used to measure the strength and direction of the linear relationship between two variables. 
Involves at least two continuous variables. No distinction is made between dependent and independent variables. 
Correlation coefficients (e.g., Pearson's r) that range from 1 to 1, indicating the strength and direction of the linear relationship. 
Regression Analysis 
Is used to predict the value of a dependent variable based on the value(s) of one or more independent variables, showing the relationship between them. That is, it is used to model the relationship between a dependent variable and one or more independent variables. 
Clearly distinguishes between dependent (response) and independent (predictor) variables. Can handle continuous and categorical variables. 
Regression coefficients that describe the impact of a oneunit change in the independent variable(s) on the dependent variable. It can also provide predictions for new data points. 
Factor Analysis 
Is used for data reduction or structure detection, aiming to identify underlying variables (factors) that explain the pattern of correlations within a set of observed variables. That is, it is used to identify latent factors that can explain patterns of correlations within a set of observed variables. 
Typically involves multiple observed continuous variables. The method looks for variables that are highly correlated and groups them into factors. 
Factor loadings that indicate the strength of the association between the observed variables and the identified latent factors. 
Cluster Analysis 
Is not typically used to identify relationships between variables, but rather to group a set of objects in such a way that objects in the same group (cluster) are more similar to each other than to those in other groups. That is, it is to group objects or cases into clusters that contain similar characteristics. 
Does not focus on the relationship between variables but on the similarities between observations across all variables. 
A set of clusters, where each cluster is a group of observations that are similar to each other and different from observations in other clusters. 
Principal Component Analysis (PCA) 
To reduce the dimensionality of a data set consisting of many variables correlated with each other, either to simplify the description of the data or to make it more understandable. 
Involves a set of possibly correlated variables and transforms them into a substantially smaller set of uncorrelated variables that represent most of the information in the original dataset. 
Principal components that are the linear combinations of the original variables. The first principal component accounts for the most variance, with each subsequent component accounting for as much of the remaining variance as possible. 
Canonical Correlation Analysis 
To understand the relationship between two multivariate sets of variables (sets of dependent variables and sets of independent variables). 
Involves two sets of variables and seeks to find linear combinations of variables in each set that are maximally correlated with each other. 
Canonical correlations for each pair of linear combinations, along with the weights for each variable in the linear combinations. 
Discriminant Analysis 
To classify observations into predefined classes based on independent variables and to understand which variables discriminate between the classes. 
Involves a categorical dependent variable (the class label) and continuous or ordinal independent variables. 
Discriminant functions based on linear combinations of the predictor variables that provide the best discrimination between the classes. 
Path Analysis 
To provide estimates of the magnitude and significance of hypothesized causal connections between sets of variables. 
Involves observed variables that are hypothesized to have a directional relationship with each other. 
Standardized coefficients that represent the direct and indirect relationships between variables within the path model. 
Structural Equation Modeling (SEM) 
To model complex relationships between observed and latent variables, and between latent variables themselves. 
Combines observed variables (similar to those used in regression analysis) and latent variables (unobserved, inferred from the observed variables). 
Path coefficients that indicate the strength of the relationship between variables, as well as goodnessoffit measures for the model. 
Multivariate Analysis of Variance (MANOVA) 
To determine if there are any differences between independent groups on more than one continuous dependent variable. 
Involves one or more categorical independent variables and two or more dependent variables. 
Wilks' Lambda, Pillai's Trace, Hotelling's Trace, and Roy's Largest Root, which are statistics used to determine whether the mean vectors of the groups are different. 
Analysis of Covariance (ANCOVA) 
To assess mean differences between groups on a dependent variable while controlling for the effects of one or more covariate(s) or continuous control variable(s). 
Involves one or more categorical independent variables, a continuous dependent variable, and one or more continuous covariate(s). 
Adjusted means for each group on the dependent variable, with statistical tests to determine whether these means are significantly different after accounting for the covariates. 