Factor Analysis Model - Python Automation and Machine Learning for ICs - - An Online Book - |
||||||||
Python Automation and Machine Learning for ICs http://www.globalsino.com/ICs/ | ||||||||
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix | ||||||||
================================================================================= Factor analysis is a statistical method used in machine learning to identify and analyze underlying factors that may influence observed variables. It is particularly useful when dealing with a large number of variables and aiming to understand the latent variables that contribute to the observed patterns. Factor analysis assumes that the observed variables are linear combinations of underlying latent factors, plus error terms:
Factor analysis is used in various fields, including psychology, economics, finance, and market research. In machine learning, it can be employed for dimensionality reduction and feature extraction, and to identify underlying latent variables that explain patterns of correlations within observed variables. Factor analysis can be more suitable for high-dimensional data (e.g., 100 variables) with a relatively small number of examples (e.g., 30 observations) compared to some other models. By identifying latent factors that explain the patterns in the data, factor analysis helps in simplifying the dataset and extracting meaningful information. In high-dimensional settings, estimating the covariance matrix accurately becomes challenging, and it may become singular or nearly singular. Factor analysis, by constraining the model parameters, can help mitigate these issues. The problems from uninvertible covariance matrices can arise when dealing with high-dimensional data in certain models, such as the naive Gaussian model. A fundamental equation in probabilistic modeling is given by, ---------------------------------- [3693a] :where,
In factor analysis, the model assumes that the observed variables ( ) are generated from a lower-dimensional set of latent factors ( ) through a linear transformation, plus some Gaussian noise. The joint distribution is factorized using Equation 3693a.In factor analysis, for Equation 3693a, we have,
In factor analysis, the latent factors are typically assumed to follow a multivariate Gaussian distribution. Mathematically, if represents a vector of latent factors, where is the dimensionality of the latent space, then (each component of the vector) can take any real value.In factor analysis, it is often assumed that the observed variables, denoted as , are continuous and follow a multivariate Gaussian distribution. This assumption is a key aspect of the factor analysis model. The factor analysis model is typically represented as follows,---------------------------------- [3693ae] where:
The error term represents the difference between the observed values ( ) and the values predicted by the model ( ). Mathematically, for an individual observation---------------------------------- [3693aeb] The latent factors can be given by, ---------------------------------- [3693af] where, is a vector of latent factors for the i-th observation, assumed to follow a multivariate Gaussian distribution with a mean vector of zeros (0) and an identity covariance matrix ( ). The covariance matrix of the observed variables can be written as,---------------------------------- [3693ag] where, is the covariance matrix of the error terms i. The Ψ is (n x n) dimensional (ℝn x n). is a diagonal matrix, which can be given by, ---------------------------------- [3693ah] Equation 3693ag implies that the observed variables are a linear combination of the latent factors (zi), plus intercepts ( ), and error terms ( ). The factor loadings matrix ( ) represents the weights of the latent factors in the linear combination. The error terms ( ) capture the unobserved variability in the observed variables. Estimating the parameters (Λ, , ) involves fitting the model to the observed data using statistical methods, such as maximum likelihood estimation.The assumption of a multivariate Gaussian distribution for the latent factors simplifies the modeling process, as it allows for tractable mathematical analysis. It also facilitates the estimation of parameters in the factor analysis model. While the assumption of Gaussian distribution is common in factor analysis, there are variants and extensions of factor analysis that relax this assumption. For example, robust factor analysis models can be used to handle deviations from normality in the data. Based on evidence lower bound (ELBO) in variational inference, Expectation-Maximization (EM) algorithm gives (see page3696), ----------------- [3693b] This is the expression for the evidence lower bound, where is a distribution used in variational inference to approximate the posterior distribution . The goal is to maximize this lower bound during the optimization process.For any θ and Q, its relationship to likelihood is given by, ---------------------------------- [3693c] Note that in E-step we maximize J with respect to Q, while in M-step, we maximize J with respect to θ. This inequality holds due to the nature of the variational inference framework. The evidence lower bound is a lower bound on the log-likelihood . The goal of variational inference is to maximize the lower bound with respect to both the model parameters and the variational parameters . The inequality ensures that, at each step of the optimization, the lower bound is improved, and as a result, the log-likelihood is also improved. Maximizing the lower bound is a surrogate for maximizing the actual log-likelihood, which may be computationally challenging.Factor Analysis is a statistical technique rather than a machine learning algorithm in the traditional sense. It falls under the broader category of multivariate statistical methods used for data analysis and dimensionality reduction. Factor Analysis is commonly employed in fields such as psychology, economics, finance, and other social sciences. While it's not a machine learning algorithm per se, Factor Analysis shares some similarities with certain machine learning methods, particularly unsupervised learning algorithms:
Types of Factor Analysis:
One specific application of Factor Analysis is educational assessment. In this application example, d is the dimensions, in the dataset, which is equal to 3, n is the number of students or individuals in the study, and m is the latent factors. Here, the scenario of the educational assessment is: Suppose we are conducting an educational assessment where students' performance in three subjects (Math, English, and Science) is being measured. We are interested in understanding latent factors that contribute to students' overall academic performance. Observed Variables ( ):
----------------- [3693d] Latent Factors ( ):
----------------- [3693e] where, Latent Factor 1 (zi,1) represents a student's inherent mathematical aptitude or quantitative reasoning skills. The higher values of indicate stronger mathematical abilities.Latent Factor 2 (zi,2) represents a student's language proficiency or verbal reasoning skills. Higher values of indicate stronger language and verbal abilities.In the example, the two latent factors are conceptualized as underlying dimensions that contribute to a student's performance in Math, English, and potentially other subjects. The specific interpretation of latent factors would depend on the study and the variables being measured. Factor Loadings ( ):
----------------- [3693f] where,
Each column of Λ corresponds to a latent factor, and each row corresponds to an observed variable. Intercepts or Means ( ):
----------------- [3693g] These represent the expected scores in each subject if the latent factors and error terms were zero. Note that if we have, ----------------- [3693gh] Therefore, the term essentially has no impact on the expected scores, that is, the expected value of each observed variable is assumed to be zero. It means that the vector of intercepts or means for the observed variables in the educational assessment example is set to zero for all subjects (Math, English, and Science). In this case, Equation 3693ae will become,----------------- [3693gi] This means that, in the absence of latent factors and error terms, the expected value of each subject's score is only determined by the factor loadings matrix and the latent factors . The intercepts ( ) are not contributing to the expected scores because they are set to zero. Note that setting intercepts to zero is a common simplifying assumption in certain contexts, and it can make the interpretation of the model more straightforward. However, the appropriateness of such assumptions depends on the characteristics of the data and the research question at hand.Error Terms ( ):
----------------- [3693h] These capture unobserved influences or random variation in the observed scores that are not explained by the latent factors. The model can be expressed by Equation 3693ae. This equation states that the observed scores of a student (xi) are a linear combination of the latent factors ( ), plus intercepts ( ), and error terms (ϵi). The factor loadings matrix ( ) represents how each latent factor contributes to each subject's score. The error terms (ϵi) account for unobserved factors or random variation in the scores.In this example, the concept of dimensionality reduction is reflected in the latent factors and the factor loadings matrix:
By using latent factors and factor loadings, factor analysis achieves dimensionality reduction by representing the observed variables in terms of a smaller number of latent factors. This allows for a more compact and interpretable representation of the data. The dimensionality reduction occurs because, instead of working with the original three subjects, we are now working with a reduced set of latent factors that summarize the common information across the subjects. The covariance matrix Ψ represents the covariance matrix of the error terms in the factor analysis model. Each element of Ψ corresponds to the covariance between the error terms associated with different observed variables:-- [3693i] Each entry in Ψ represents the covariance between the corresponding error terms. However, in the common assumptions of factor analysis, it is often assumed that Ψ is a diagonal matrix, meaning that the error terms are uncorrelated across different subjects. In this case, the off-diagonal elements in Ψ would be zero, and the matrix becomes, ------------------------ [3693i] where, , , and represent the variances of the error terms for Math, English, and Science, respectively. This diagonal structure simplifies the modeling process and is a common assumption in factor analysis. The variances of the error term (Var(ϵ)) is a measure of how much individual observations deviate from their predicted values on average. A high variance indicates that the individual observations are spread out from the predicted values, suggesting greater variability in the model's performance. A low variance suggests that the observations are closely clustered around the predicted values. Note that in statistical modeling, the variances of the error term refer to the variability or dispersion of the errors or residuals in a model. The error term represents the difference between the observed values and the values predicted by the model. The variances of the error terms provide information about the spread of these differences. Dimensionality reduction, including methods like factor analysis, can potentially be "wrong" or inappropriate in certain situations:
Before applying dimensionality reduction methods, it's essential to carefully assess the characteristics of the data and the appropriateness of the chosen model. Additionally, validation techniques and assessing the quality of the reduced-dimensional representation are important steps to ensure the reliability of the results. It's also a good practice to consider alternative methods and compare their performance in the specific context of the data at hand. In the educational assessment example above, additional dimensions beyond the originally mentioned latent factors ( and zi,2) could be considered based on the specific goals of the assessment and the nature of the data:
The choice of additional dimensions depends on the specific research questions, the available data, and the theoretical framework guiding the assessment. It's important to strike a balance between including enough dimensions to capture relevant aspects of student performance and avoiding an excessive number of dimensions that may lead to overfitting or reduce interpretability. Additionally, the inclusion of additional dimensions should be theoretically justified and aligned with the goals of the assessment. One application example of Factor Analysis Model is to study the impact of fabrication conditions on semiconductor wafer fail rates. Assuming we have a dataset containing the fail rates of semiconductor wafers under 40 different test bins (data) as shown below. The dataset includes five columns, each representing one of five wafers, with the fail rates measured for each. The wafers were fabricated using different combinations of 10 possible conditions. Specifically, Wafer1 was fabricated under Conditions 1 and 2; Wafer2 under Conditions 1, 2, 3, 6, and 9; Wafer3 under Conditions 1, 8, 9, and 10; Wafer4 under Conditions 1, 2, 3, 5, and 7; and Wafer5 under Conditions 1, 4, 5, and 8. We want to perform a fail analysis to understand the relationships between these varying fabrication conditions and the observed fail rates across the different bins. This will involve identifying any patterns or correlations that may exist between the conditions and the fail rates, which could help in pinpointing specific conditions that lead to higher fail rates, thereby facilitating improvements in fabrication processes. A Python code is used to analyze the data below: Here, to analyze the impact of the different fabrication conditions on the fail rates of each wafer using a factor analysis model, we use a statistical approach where the factors are the different conditions used in the fabrication of the wafers. In this context, a factor analysis model help us understand the underlying patterns or factors that influence the fail rates, by reducing the number of observed variables (fabrication conditions) into fewer factors. This script assumes that the fabrication conditions can be treated as quantitative input variables, which may not entirely be the case, but this simplified approach can still provide some insights. The factor analysis model is fitted to the data. The output from the modeling is below: Outputs include eigenvalues to assess the number of meaningful factors, factor loadings which tell us how each variable (wafer) relates to the underlying factors, and communalities which indicate how much of the variance in each variable is explained by the factors. The eigenvalues measure the amount of variance in the original variables that is accounted for by each factor. The general rule of thumb is that only factors with eigenvalues greater than 1 should be considered, as they explain more variance than a single original variable. The results show three factors with eigenvalues above 1, suggesting these three factors capture significant variance and might be worth considering in the analysis. The various underlying factors could be interpreted as different effects due to the fabrication conditions even though the relationship between the fabrication conditions (Conditions 1 through 10) and the factors (Factors 0, 1, 2, and 3) identified by the factor analysis is not directly observable from the factor analysis of fail rates alone. Factor analysis can tell us how much of the variance in fail rates across different wafers can be explained by a smaller number of unobserved factors, but it doesn't directly reveal the nature of these factors or their specific relationship to the conditions unless we have additional data or perform further analysis:
Factor Loadings are coefficients which measure the correlation between each wafer and the factors. A high absolute value indicates that the factor has a significant influence on that wafer. In the matrix in the output, we have:
Communalities indicate how much of the variance in each original variable (wafer) is explained by all the extracted factors together. For example, the communality for Wafer1 is 0.460, meaning 46% of its variance is explained by the factors in the model. Generally, higher communalities are desirable as they indicate more variance is accounted for by the model. In the dataset above (data), each wafer has fail rates across 40 test bins (40 rows in the csv file), which can be considered as 40 dimensions (one for each bin). In the factor analysis, each of these dimensions represents a different variable (in this case, fail rates for each bin). When we perform factor analysis, we are effectively reducing these 40 dimensions into a smaller number of dimensions, which are called factors. The factors capture the underlying relationships or patterns among the original 40 dimensions while reducing the complexity of the data:
============================================
|
||||||||
================================================================================= | ||||||||
|
||||||||