Support Vector Machines (SVM) and Logistic Regression
- Python and Machine Learning for Integrated Circuits -
- An Online Book -
Python and Machine Learning for Integrated Circuits                                                           http://www.globalsino.com/ICs/

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Support Vector Machines (SVM) and Logistic Regression are both popular machine learning algorithms used for classification tasks. Table 3814 lists the comparison between support vector machines (SVM) and logistic regression. In some practical cases, an SVM outperforms logistic regression, but we really want to deploy logistic regression for our application because of computational efficiency, etc. (see page3710).

Table 3814. Support vector machines (SVM) and logistic regression.

Support vector machines Logistic regression
Nature of Algorithm Is a discriminative model that tries to find a hyperplane that best separates the data into different classes while maximizing the margin between them. Is a probabilistic model that estimates the probability of a data point belonging to a particular class. It uses the logistic function to model this probability.
Linearity Can handle non-linear data by using kernel functions to transform the data into a higher-dimensional space where it may become linearly separable. Can handle non-linear data to some extent but might require feature engineering to capture complex non-linear relationships.
Loss Function Uses a hinge loss function, which is less sensitive to outliers and focuses on maximizing the margin. Uses the logistic loss (cross-entropy) function, which is a smooth, differentiable function that measures the likelihood of the data given the model parameters.
Interpretability Are less interpretable since the focus is on finding the best hyperplane, and the importance of individual features is not as clear. Provides more interpretable results, as it directly models the effect of each feature on the probability of belonging to a class.
Overfitting Are less prone to overfitting when the dataset is small, thanks to the margin maximization concept. Can be more prone to overfitting when the number of features is large compared to the number of data points.
Scalability Can be computationally expensive, especially when dealing with large datasets, as it requires solving a convex optimization problem. Is typically faster and more scalable, making it a good choice for large datasets.
Multi-class Classification Can be used for multi-class classification using techniques like one-vs-one or one-vs-all. Can be extended to handle multi-class problems using techniques like softmax regression.
Regularization Has built-in L2 regularization, which helps prevent overfitting. Allows for easy incorporation of different types of regularization (L1, L2, or both).
Hyperparameter Sensitivity Has parameters like the choice of kernel, C (trade-off between margin width and classification error), and kernel parameters that can be sensitive. Is less sensitive to hyperparameter settings, making it easier to tune.
Formula

Decision function is given by:
x ∈ ℝn, b ∈ ℝ

Hypothesis fuction is given by:

x ∈ ℝn+1, x0 = 1
Case studies

============================================

=================================================================================