
Support vector machines 
Logistic regression 
Nature of Algorithm 
Is a discriminative model that tries to find a hyperplane that best separates the data into different classes while maximizing the margin between them. 
Is a probabilistic model that estimates the probability of a data point belonging to a particular class. It uses the logistic function to model this probability. 
Linearity 
Can handle nonlinear data by using kernel functions to transform the data into a higherdimensional space where it may become linearly separable. 
Can handle nonlinear data to some extent but might require feature engineering to capture complex nonlinear relationships. 
Loss Function 
Uses a hinge loss function, which is less sensitive to outliers and focuses on maximizing the margin. 
Uses the logistic loss (crossentropy) function, which is a smooth, differentiable function that measures the likelihood of the data given the model parameters. 
Interpretability 
Are less interpretable since the focus is on finding the best hyperplane, and the importance of individual features is not as clear. 
Provides more interpretable results, as it directly models the effect of each feature on the probability of belonging to a class. 
Overfitting 
Are less prone to overfitting when the dataset is small, thanks to the margin maximization concept. 
Can be more prone to overfitting when the number of features is large compared to the number of data points. 
Scalability 
Can be computationally expensive, especially when dealing with large datasets, as it requires solving a convex optimization problem. 
Is typically faster and more scalable, making it a good choice for large datasets. 
Multiclass Classification 
Can be used for multiclass classification using techniques like onevsone or onevsall. 
Can be extended to handle multiclass problems using techniques like softmax regression. 
Regularization 
Has builtin L2 regularization, which helps prevent overfitting. 
Allows for easy incorporation of different types of regularization (L1, L2, or both). 
Hyperparameter Sensitivity 
Has parameters like the choice of kernel, C (tradeoff between margin width and classification error), and kernel parameters that can be sensitive. 
Is less sensitive to hyperparameter settings, making it easier to tune. 
Formula 
Decision function is given by:
x ∈ ℝ^{n}, b ∈ ℝ 
Hypothesis fuction is given by:
x ∈ ℝ^{n+1}, x_{0} = 1 
Case studies 
