Factor Analysis Model

Factor Analysis Model
- Python Automation and Machine Learning for ICs -
- An Online Book -

Python Automation and Machine Learning for ICs http://www.globalsino.com/ICs/

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Factor analysis is a statistical method used in machine learning to identify and analyze underlying factors that may influence observed variables. It is particularly useful when dealing with a large number of variables and aiming to understand the latent variables that contribute to the observed patterns. Factor analysis assumes that the observed variables are linear combinations of underlying latent factors, plus error terms:

Latent Factors: These are unobservable variables that are hypothesized to influence the observed variables. Factor analysis seeks to identify and estimate these latent factors.
Observed Variables: These are the variables that are directly measured or observed. The goal is to understand the relationships among these variables in terms of the underlying latent factors.
Loading: Loading refers to the strength of the relationship between an observed variable and a latent factor. High loading indicates a strong relationship.
Eigenvalues and Eigenvectors: These are mathematical concepts used in factor analysis. Eigenvalues represent the amount of variance explained by each factor, while eigenvectors represent the direction of the factors.
Factor Rotation: After the initial factor extraction, rotation techniques may be applied to obtain a simpler and more interpretable factor structure.

Factor analysis is used in various fields, including psychology, economics, finance, and market research. In machine learning, it can be employed for dimensionality reduction and feature extraction, and to identify underlying latent variables that explain patterns of correlations within observed variables. Factor analysis can be more suitable for high-dimensional data (e.g., 100 variables) with a relatively small number of examples (e.g., 30 observations) compared to some other models. By identifying latent factors that explain the patterns in the data, factor analysis helps in simplifying the dataset and extracting meaningful information.

In high-dimensional settings, estimating the covariance matrix accurately becomes challenging, and it may become singular or nearly singular. Factor analysis, by constraining the model parameters, can help mitigate these issues. The problems from uninvertible covariance matrices can arise when dealing with high-dimensional data in certain models, such as the naive Gaussian model.

A fundamental equation in probabilistic modeling is given by,

---------------------------------- [3693a]

:where,

is the joint probability distribution of the observed variables () and latent factors ().
is the conditional probability distribution of the observed variables given the latent factors. This is often modeled as a multivariate Gaussian distribution.
is the prior probability distribution of the latent factors. In factor analysis, this is often assumed to be a multivariate Gaussian distribution.

In factor analysis, the model assumes that the observed variables () are generated from a lower-dimensional set of latent factors () through a linear transformation, plus some Gaussian noise. The joint distribution is factorized using Equation 3693a.

In factor analysis, for Equation 3693a, we have,

represents the prior distribution of the latent factors, often assumed to follow a multivariate Gaussian distribution with zero mean and a covariance matrix (Σ).
represents the likelihood of observing the data given the latent factors. It is assumed to be a multivariate Gaussian distribution with a mean determined by the linear transformation of the latent factors and a diagonal covariance matrix representing the noise.

In factor analysis, the latent factors are typically assumed to follow a multivariate Gaussian distribution. Mathematically, if represents a vector of latent factors, where is the dimensionality of the latent space, then (each component of the vector) can take any real value.

In factor analysis, it is often assumed that the observed variables, denoted as , are continuous and follow a multivariate Gaussian distribution. This assumption is a key aspect of the factor analysis model. The factor analysis model is typically represented as follows,

factor analysis ---------------------------------- [3693ae]

where:

⁽ⁱ⁾ is a vector of observed variables for the -th observation.
⁽ⁱ⁾ is a vector of latent (unobserved) factors for the -th observation, assumed to follow a multivariate Gaussian distribution. It is d dimensional (ℝ^d) with d < n.
is the matrix of factor loadings, representing the relationship between the latent factors and the observed variables. It is (n x d) dimensional (ℝ^{n x d}).
is a vector of intercepts or means for the observed variables. It is n dimensional (ℝⁿ).
⁽ⁱ⁾ is a vector of error terms.

The error term represents the difference between the observed values () and the values predicted by the model (). Mathematically, for an individual observation

---------------------------------- [3693aeb]

The latent factors can be given by,

---------------------------------- [3693af]

where,

is a vector of latent factors for the i-th observation, assumed to follow a multivariate Gaussian distribution with a mean vector of zeros (0) and an identity covariance matrix ().

The covariance matrix of the observed variables can be written as,

factor analysis ---------------------------------- [3693ag]

where,

is the covariance matrix of the error terms _i. The Ψ is (n x n) dimensional (ℝ^{n x n}). is a diagonal matrix, which can be given by,

factor analysis ---------------------------------- [3693ah]

Equation 3693ag implies that the observed variables are a linear combination of the latent factors (z_i), plus intercepts (), and error terms (). The factor loadings matrix () represents the weights of the latent factors in the linear combination. The error terms () capture the unobserved variability in the observed variables. Estimating the parameters (Λ, , ) involves fitting the model to the observed data using statistical methods, such as maximum likelihood estimation.

The assumption of a multivariate Gaussian distribution for the latent factors simplifies the modeling process, as it allows for tractable mathematical analysis. It also facilitates the estimation of parameters in the factor analysis model. While the assumption of Gaussian distribution is common in factor analysis, there are variants and extensions of factor analysis that relax this assumption. For example, robust factor analysis models can be used to handle deviations from normality in the data.

Based on evidence lower bound (ELBO) in variational inference, Expectation-Maximization (EM) algorithm gives (see page3696),

Expectation-Maximization (EM) algorithm ----------------- [3693b]

This is the expression for the evidence lower bound, where is a distribution used in variational inference to approximate the posterior distribution . The goal is to maximize this lower bound during the optimization process.

For any θ and Q, its relationship to likelihood is given by,

factor analysis ---------------------------------- [3693c]

Note that in E-step we maximize J with respect to Q, while in M-step, we maximize J with respect to θ.

This inequality holds due to the nature of the variational inference framework. The evidence lower bound is a lower bound on the log-likelihood . The goal of variational inference is to maximize the lower bound with respect to both the model parameters and the variational parameters . The inequality ensures that, at each step of the optimization, the lower bound is improved, and as a result, the log-likelihood is also improved. Maximizing the lower bound is a surrogate for maximizing the actual log-likelihood, which may be computationally challenging.

Factor Analysis is a statistical technique rather than a machine learning algorithm in the traditional sense. It falls under the broader category of multivariate statistical methods used for data analysis and dimensionality reduction. Factor Analysis is commonly employed in fields such as psychology, economics, finance, and other social sciences. While it's not a machine learning algorithm per se, Factor Analysis shares some similarities with certain machine learning methods, particularly unsupervised learning algorithms:

Unsupervised Learning: Factor Analysis is often considered an unsupervised learning technique because it works with unlabeled data and seeks to discover underlying patterns or structures within the data.
Dimensionality Reduction: Similar to some machine learning algorithms, Factor Analysis is used for dimensionality reduction by capturing the essential information in a lower-dimensional latent space.
Probabilistic Framework: Factor Analysis is typically formulated within a probabilistic framework, making it akin to probabilistic graphical models, which are used in some machine learning approaches.
Parameter Estimation: The parameters of a Factor Analysis model, such as factor loadings and covariance matrices, are estimated from the data, which is a common aspect of machine learning where models are trained on data.

Types of Factor Analysis:

Exploratory Factor Analysis (EFA): This type is used when researchers do not have a specific hypothesis about the number of factors or their relationships.
Confirmatory Factor Analysis (CFA): This type is used to test a specific hypothesis about the number and nature of factors.

One specific application of Factor Analysis is educational assessment. In this application example, d is the dimensions, in the dataset, which is equal to 3, n is the number of students or individuals in the study, and m is the latent factors.

Here, the scenario of the educational assessment is:

Suppose we are conducting an educational assessment where students' performance in three subjects (Math, English, and Science) is being measured. We are interested in understanding latent factors that contribute to students' overall academic performance.

Observed Variables ():

is a vector representing the scores of the i-th student in Math (), English (), and Science (x_i,Science).

Expectation-Maximization (EM) algorithm ----------------- [3693d]

Latent Factors ():

is a vector representing latent factors that are not directly observed. These factors could represent underlying cognitive abilities or study habits that contribute to academic performance. Let's assume there are two latent factors.

Expectation-Maximization (EM) algorithm ----------------- [3693e]

where,

Latent Factor 1 (z_i,1) represents a student's inherent mathematical aptitude or quantitative reasoning skills. The higher values of indicate stronger mathematical abilities.

Latent Factor 2 (z_i,2) represents a student's language proficiency or verbal reasoning skills. Higher values of indicate stronger language and verbal abilities.

In the example, the two latent factors are conceptualized as underlying dimensions that contribute to a student's performance in Math, English, and potentially other subjects. The specific interpretation of latent factors would depend on the study and the variables being measured.

Factor Loadings ():

is the matrix of factor loadings, representing how each latent factor contributes to each observed variable.

Expectation-Maximization (EM) algorithm ----------------- [3693f]

where,

represents the loading of latent factor 1 on the Math variable.
represents the loading of latent factor 2 on the Math variable.
epresents the loading of latent factor 1 on the English variable.
represents the loading of latent factor 2 on the English variable.
represents the loading of latent factor 1 on the Science variable.
represents the loading of latent factor 2 on the Science variable.

Each column of Λ corresponds to a latent factor, and each row corresponds to an observed variable.

Intercepts or Means ():

is a vector representing intercepts or means for the observed variables.

Expectation-Maximization (EM) algorithm ----------------- [3693g]

These represent the expected scores in each subject if the latent factors and error terms were zero.

Note that if we have,

Expectation-Maximization (EM) algorithm ----------------- [3693gh]

Therefore, the term essentially has no impact on the expected scores, that is, the expected value of each observed variable is assumed to be zero. It means that the vector of intercepts or means for the observed variables in the educational assessment example is set to zero for all subjects (Math, English, and Science). In this case, Equation 3693ae will become,

Expectation-Maximization (EM) algorithm ----------------- [3693gi]

This means that, in the absence of latent factors and error terms, the expected value of each subject's score is only determined by the factor loadings matrix and the latent factors . The intercepts () are not contributing to the expected scores because they are set to zero. Note that setting intercepts to zero is a common simplifying assumption in certain contexts, and it can make the interpretation of the model more straightforward. However, the appropriateness of such assumptions depends on the characteristics of the data and the research question at hand.

Error Terms ():

ϵ_i is a vector representing error terms for the i-th student.

Expectation-Maximization (EM) algorithm ----------------- [3693h]

These capture unobserved influences or random variation in the observed scores that are not explained by the latent factors.

The model can be expressed by Equation 3693ae. This equation states that the observed scores of a student (x_i) are a linear combination of the latent factors (), plus intercepts (), and error terms (ϵ_i). The factor loadings matrix () represents how each latent factor contributes to each subject's score. The error terms (ϵ_i) account for unobserved factors or random variation in the scores.

In this example, the concept of dimensionality reduction is reflected in the latent factors and the factor loadings matrix:

Latent Factors:
- The latent factors represent unobserved underlying dimensions that contribute to students' performance in multiple subjects (Math, English, and Science).
- While there are potentially multiple subjects (observed variables), the latent factors aim to capture the shared variance or commonality across these subjects.
Factor Loadings Matrix:
- The factor loadings matrix specifies the relationships between the latent factors and the observed variables. Each column of corresponds to a latent factor, and each row corresponds to an observed variable.
- The entries in indicate the strength and direction of the influence of each latent factor on each observed variable.

By using latent factors and factor loadings, factor analysis achieves dimensionality reduction by representing the observed variables in terms of a smaller number of latent factors. This allows for a more compact and interpretable representation of the data. The dimensionality reduction occurs because, instead of working with the original three subjects, we are now working with a reduced set of latent factors that summarize the common information across the subjects.

The covariance matrix Ψ represents the covariance matrix of the error terms in the factor analysis model. Each element of Ψ corresponds to the covariance between the error terms associated with different observed variables:

Expectation-Maximization (EM) algorithm -- [3693i]

Each entry in Ψ represents the covariance between the corresponding error terms. However, in the common assumptions of factor analysis, it is often assumed that Ψ is a diagonal matrix, meaning that the error terms are uncorrelated across different subjects. In this case, the off-diagonal elements in Ψ would be zero, and the matrix becomes,

Expectation-Maximization (EM) algorithm ------------------------ [3693i]

where,

, , and represent the variances of the error terms for Math, English, and Science, respectively. This diagonal structure simplifies the modeling process and is a common assumption in factor analysis.

The variances of the error term (Var(ϵ)) is a measure of how much individual observations deviate from their predicted values on average. A high variance indicates that the individual observations are spread out from the predicted values, suggesting greater variability in the model's performance. A low variance suggests that the observations are closely clustered around the predicted values.

Note that in statistical modeling, the variances of the error term refer to the variability or dispersion of the errors or residuals in a model. The error term represents the difference between the observed values and the values predicted by the model. The variances of the error terms provide information about the spread of these differences.

Dimensionality reduction, including methods like factor analysis, can potentially be "wrong" or inappropriate in certain situations:

Assumption Violation:
- Many dimensionality reduction methods, including factor analysis, are based on certain assumptions about the data, such as linearity and Gaussian distributions. If these assumptions are violated, the results of the dimensionality reduction may not be reliable or meaningful.
Incorrect Model Specification:
- Choosing an incorrect model or setting inappropriate parameters can lead to suboptimal results. For example, if the number of latent factors is overestimated or underestimated, the dimensionality reduction may not accurately capture the underlying structure of the data.
Nonlinear Relationships:
- Factor analysis assumes linear relationships between latent factors and observed variables. If the relationships are highly nonlinear, factor analysis may not effectively capture the underlying patterns in the data.
Outliers and Anomalies:
- Dimensionality reduction methods can be sensitive to outliers and anomalies in the data. Outliers may disproportionately influence the estimation of latent factors and lead to inaccurate results.
Small Sample Size:
- When the sample size is small, the estimation of factor loadings and other parameters may be less stable, and the dimensionality reduction may not generalize well to new data.
Model Overfitting:
- Like any modeling approach, there's a risk of overfitting the data. Overfitting occurs when the model captures noise or idiosyncrasies in the training data rather than underlying patterns, leading to poor generalization performance.
Non-Normality of Data:
- Factor analysis assumes that the observed variables are normally distributed. If the data are significantly non-normal, it may affect the validity of the results.
Complex Data Structures:
- In cases where the data have complex structures that cannot be adequately captured by a linear combination of latent factors, dimensionality reduction methods like factor analysis may not be suitable.

Before applying dimensionality reduction methods, it's essential to carefully assess the characteristics of the data and the appropriateness of the chosen model. Additionally, validation techniques and assessing the quality of the reduced-dimensional representation are important steps to ensure the reliability of the results. It's also a good practice to consider alternative methods and compare their performance in the specific context of the data at hand.

In the educational assessment example above, additional dimensions beyond the originally mentioned latent factors ( and z_i,2) could be considered based on the specific goals of the assessment and the nature of the data:

Subject-Specific Latent Factors:
- Instead of representing academic performance with only two latent factors, we could introduce additional latent factors that are specific to each subject. For example, and could represent subject-specific abilities.
Study Habits:
- Include latent factors related to students' study habits or approaches to learning. This could capture aspects such as time management, study organization, and self-discipline, which may influence academic performance.
Socioeconomic Status (SES):
- Incorporate a latent factor representing socioeconomic status, which can have an impact on students' access to resources, educational support, and overall well-being.
Motivation:
- Include latent factors related to motivation, interest, or enthusiasm for learning. Motivational factors can significantly influence academic outcomes.
Cognitive Skills:
- Consider additional latent factors representing specific cognitive skills, such as critical thinking, problem-solving, or creativity, which may contribute to academic success.
Emotional Well-being:
- Introduce latent factors related to emotional well-being or psychological factors that can affect students' ability to concentrate, focus, and perform well academically.
Parental Involvement:
- Include latent factors representing the level of parental involvement in a student's education, which may include aspects like parental support, encouragement, and participation in school-related activities.
Learning Styles:
- Consider latent factors representing different learning styles, such as visual, auditory, or kinesthetic, which could impact how students engage with and understand the material.

The choice of additional dimensions depends on the specific research questions, the available data, and the theoretical framework guiding the assessment. It's important to strike a balance between including enough dimensions to capture relevant aspects of student performance and avoiding an excessive number of dimensions that may lead to overfitting or reduce interpretability. Additionally, the inclusion of additional dimensions should be theoretically justified and aligned with the goals of the assessment.

One application example of Factor Analysis Model is to study the impact of fabrication conditions on semiconductor wafer fail rates. Assuming we have a dataset containing the fail rates of semiconductor wafers under 40 different test bins (data) as shown below. The dataset includes five columns, each representing one of five wafers, with the fail rates measured for each. The wafers were fabricated using different combinations of 10 possible conditions. Specifically, Wafer1 was fabricated under Conditions 1 and 2; Wafer2 under Conditions 1, 2, 3, 6, and 9; Wafer3 under Conditions 1, 8, 9, and 10; Wafer4 under Conditions 1, 2, 3, 5, and 7; and Wafer5 under Conditions 1, 4, 5, and 8. We want to perform a fail analysis to understand the relationships between these varying fabrication conditions and the observed fail rates across the different bins. This will involve identifying any patterns or correlations that may exist between the conditions and the fail rates, which could help in pinpointing specific conditions that lead to higher fail rates, thereby facilitating improvements in fabrication processes. A Python code is used to analyze the data below:

Here, to analyze the impact of the different fabrication conditions on the fail rates of each wafer using a factor analysis model, we use a statistical approach where the factors are the different conditions used in the fabrication of the wafers. In this context, a factor analysis model help us understand the underlying patterns or factors that influence the fail rates, by reducing the number of observed variables (fabrication conditions) into fewer factors. This script assumes that the fabrication conditions can be treated as quantitative input variables, which may not entirely be the case, but this simplified approach can still provide some insights. The factor analysis model is fitted to the data. The output from the modeling is below:

Outputs include eigenvalues to assess the number of meaningful factors, factor loadings which tell us how each variable (wafer) relates to the underlying factors, and communalities which indicate how much of the variance in each variable is explained by the factors. The eigenvalues measure the amount of variance in the original variables that is accounted for by each factor. The general rule of thumb is that only factors with eigenvalues greater than 1 should be considered, as they explain more variance than a single original variable. The results show three factors with eigenvalues above 1, suggesting these three factors capture significant variance and might be worth considering in the analysis.

The various underlying factors could be interpreted as different effects due to the fabrication conditions even though the relationship between the fabrication conditions (Conditions 1 through 10) and the factors (Factors 0, 1, 2, and 3) identified by the factor analysis is not directly observable from the factor analysis of fail rates alone. Factor analysis can tell us how much of the variance in fail rates across different wafers can be explained by a smaller number of unobserved factors, but it doesn't directly reveal the nature of these factors or their specific relationship to the conditions unless we have additional data or perform further analysis:

Factor Analysis primarily reduces the dimensionality of the data, identifying latent variables (factors) that capture the most variance across observed variables (in this case, fail rates of wafers). However, what these factors represent in the context of wafer fabrication conditions isn't determined just by the analysis.
Practical Application despite the initial lack of direct meaning:
- Potential Insights: The analysis still offers value by highlighting which wafers are more similarly affected by underlying but unidentified factors. This can be a starting point for more focused investigations.
- Guided Hypotheses: Use the factor analysis results to form hypotheses about potential relationships between fabrication conditions and fail rates, which can then be tested through experimental or more targeted data analyses.

Factor Loadings are coefficients which measure the correlation between each wafer and the factors. A high absolute value indicates that the factor has a significant influence on that wafer. In the matrix in the output, we have:

Factor 0 has a strong loading on Wafer1 (0.623) and a moderate negative loading on Wafer2 (-0.298). This suggests that whatever this factor represents, it has a positive impact on the fail rate of Wafer1 and a negative impact on Wafer2.
Factor 1 shows the strongest loading on Wafer5 (0.579), indicating this factor influences Wafer5 more than the others.
Factor 2 shows strong loadings on Wafer3 (0.478) and Wafer2 (0.443), meaning this factor significantly influences these two wafers.
Factor 3 primarily impacts Wafer4 (0.524), which could be a unique characteristic or condition impacting this wafer.
Interpretation of loadings:
- A positive loading (+0.623 on Wafer1 for Factor 0) indicates that as Factor 0 increases, the fail rate of Wafer1 also increases.
- A negative loading (-0.297 on Wafer2 for Factor 0) indicates that as Factor 0 increases, the fail rate of Wafer2 decreases.

Communalities indicate how much of the variance in each original variable (wafer) is explained by all the extracted factors together. For example, the communality for Wafer1 is 0.460, meaning 46% of its variance is explained by the factors in the model. Generally, higher communalities are desirable as they indicate more variance is accounted for by the model.

In the dataset above (data), each wafer has fail rates across 40 test bins (40 rows in the csv file), which can be considered as 40 dimensions (one for each bin). In the factor analysis, each of these dimensions represents a different variable (in this case, fail rates for each bin). When we perform factor analysis, we are effectively reducing these 40 dimensions into a smaller number of dimensions, which are called factors. The factors capture the underlying relationships or patterns among the original 40 dimensions while reducing the complexity of the data:

Original Dimensions: We start with 40 fail rates for each of the 5 wafers, so essentially, we are working with 40 dimensions across the dataset.
Reduced Dimensions (Factors): In the factor analysis we conducted, we chose to extract 5 factors. This means we have reduced the data from 40 dimensions to 5 new composite dimensions. Each of these new dimensions (factors) is a combination of the original 40, intended to capture significant variance and patterns across the fail rates.

The purpose of reducing from 40 dimensions to 5 factors is to simplify the data analysis by focusing on major underlying trends that influence the fail rates, rather than dealing with each bin individually. This reduction helps in understanding broader patterns and can be particularly useful for identifying common causes of failure or efficiency issues in the manufacturing process of semiconductor wafers.

============================================

=================================================================================