Isolation Forest

Anomaly / Outliers:

In Data Science data patterns that have different characteristics from normal data are called anomalies. By detecting anomalies, it provides critical and actionable information in various application domains such as eCommerce, retail , banking etc.

Isolation forest algorithm:

Isolation forest is the newest techniques to detect anomalies. The is based on various data points from data detection. This is basically different from other methods used such as density, basic distance measures and Isolation, which is more effective and efficient ways to detect anomalies.

Advantages :

  • Provides Low linear time complexity
  • Memory usage is less
  • Generally, it builds a model that performs really good

Example :

A typical example is Monthly Sales of various stores across US with the sales data, sales transaction amount,  and sales quantity using isolation forest algorithm. A typical example is shown in the Figure 1 below. The outliers is shown in red dots in Figure 1.  The background contour graph will clearly segregate the border of inliers and outliers. Screen Shot 2017-07-27 at 2.56.56 PM In Figure 2, we can compare other outlier detection algorithms such as  One class SVM, Robust co-variance and isolation forest. Screen Shot 2017-07-27 at 2.57.45 PM.png Screen Shot 2017-07-27 at 2.58.23 PM