Isolation Forest Algorithm - Python Automation and Machine Learning for ICs - - An Online Book - |
||||||||
Python Automation and Machine Learning for ICs http://www.globalsino.com/ICs/ | ||||||||
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix | ||||||||
================================================================================= Isolation Forest is an algorithm used for anomaly detection in machine learning. [1] The primary goal of the Isolation Forest algorithm is to isolate anomalies or outliers in a dataset. The key idea behind the Isolation Forest algorithm is to build a tree structure where anomalies are isolated into individual leaves. The algorithm uses the fact that anomalies are typically rare and different from the majority of the data points:
In anomaly detection using the Isolation Forest algorithm, anomalies are typically identified based on the isolation score of each data point. The isolation score is a measure of how easily a data point can be isolated or separated from the rest of the data. The lower the isolation score, the more likely the point is considered an anomaly. For Isolation Forest, the decision function is often based on the concept of path length. The intuition is that anomalies will have shorter average path lengths in the trees built by the algorithm. The isolation score for a data point is computed as follows:------------------------------------------- [3699a] where,
The decision function is then derived from the isolation score:--------------------------------------- [3699b] where, represents the probability of point being an anomaly. Lower values of in Equation 3699b indicate a higher likelihood of being an anomaly. Figure 3699 shows isolation forest algorithm for anomaly detection. In the Python script using scikit-learn's IsolationForest, the decision function is available as decision_function, and the anomaly score (negative of the decision function) is used to identify anomalies. The specific decision threshold for classifying a point as an anomaly depends on the application and can be adjusted based on the desired level of sensitivity to anomalies.(a) (b)
============================================
|
||||||||
================================================================================= | ||||||||
|
||||||||