Anomaly detection in a complex environment requires a set of assumptions about the normal behavior of the data set, and anomalies are events of deviation from normal behavior.
How does anomaly detection improve business decision making?
Detecting anomalies in business data and its subsequent analysis can lead to proactive identification and faster resolution of critical issues, and generation of new business insights.
The volume of information collected by businesses has seen exponential growth and making sense of the data is overwhelming for even the most accomplished of leadership teams without efficient systems. Even the best business intelligence dashboards fall short when explaining complex data structures, correlations or variances. That is particularly true for business operations, where anomalies could disrupt daily operations or services. In such situations, businesses simply can’t wait for days or long time periods to find a resolution. Automated and constant discovery of anomalies can reveal significant insights leading to opportunities for optimizing the business operations
Today, we live in an era where businesses operate in real-time. Activities happen simultaneously. Also, modern work is collaborative. Colleagues with distinct job roles are responsible for monitoring business operations across departments. For instance, at the systems infrastructure level, a site reliability engineering team cautiously screens the activity and execution of the system, the servers, and the communication networks. At the business application level, an application support team monitors the website page burden times, the database reaction time, and the client experience. At the business function level, SMEs watch client activity transformations by topography and by client profile, changes per catalyst/event, or whatever KPIs are critical to the business.
Abnormalities in a single function can cause a domino effect and end up influencing different departments. If the measurements are not examined at every level, the correlations will go unobserved. This is the solution that any scalable anomaly detection framework should provide.
Are business dashboards enough for detecting anomalies?
Most companies already use metrics to measure operational and financial performance, though metric types may vary based on the industry. Apart from these metrics, every business maintains KPI’s to measure the performance. Monitoring and analyzing business data patterns in real-time can help detect subtle and sometimes not-so-subtle and unexpected situations that warrant investigation and corrective action
There is a large number of metrics in use that help a business to determine the company’s current health compared to its previous financial years or its economic outlook.
Most companies today tend to do manual detection of anomalies, and use the following two methods :
- Build and monitor dashboards: Create daily/weekly dashboards and reports, and have people monitor them for spikes or dips.
- System and rules-based classification: Build a system where pre-defined rules for each of the metrics exist. For example, if a metric has a threshold rule set, then the data points crossing the upper or lower bound of the threshold are classified as anomalies.
Both of these methods involve human understanding and observation of the data and do not scale well. Moreover, business data patterns keep changing over time, so static methods defined at the outset may not detect future anomalies.
Businesses that implement correct anomaly detection models can find even the most subtle abnormalities. Companies that don’t apply the right models can endure many false-positives or, even worse, failures to recognize a significant number of anomalies resulting in lost income, disappointed clients, or missed business opportunities.
The solution, hence, is an anomaly detection system that is adaptive and automated, so that it easily scales and adapts to detect the anomalies even when data patterns change over time. In such situations, artificial intelligence can come to our rescue.
Anomaly detection systems built with artificial intelligence are automated, adaptive and scalable
AI is a field of science that empowers machines to imitate human knowledge. Machine Learning is a subset of artificial intelligence that enables a machine to learn and improve by breaking down the datasets to learn the mapping functions between inputs and outputs.
Machine learning is a branch of science that focusses on getting machines to learn and act in a way similar to humans while also autonomously learning from real-world interactions and training datasets. Machine Learning meets our requirements, i.e., it is automated, adaptive, and quickly scalable from small to large scale businesses.
Machine learning applies specific algorithms to address the problem of anomaly detection. There are mainly two methods in machine learning for doing anomaly detection.
- Supervised Learning: Supervised learning, as the name suggests, is a form of learning where a teacher is involved. In supervised learning, we train the machine using data that is well labeled. This means that the dataset consists of a large number of examples, each example consists of two parts. The first part is the part of the independent variable, it consists of a set of features for a particular example. The second part is the dependent variable or the correct label representing the correct output given a set of independent variables/features. The task of supervised learning is to learn a function that approximates the mapping of these independent variables to the dependent variable.
- Unsupervised Learning: Unsupervised learning is a technique of machine learning where the machine is provided with data that only has an independent variable and has no correct label. The task of the machine learning algorithms is to find a pattern in the dataset that connects the data points and explains the dataset as much as possible so that data points can be labeled.
There is a third, less known technique: Semi-Supervised Learning. Semi-supervised learning is a class of machine learning tasks and techniques that use a small amount of labeled data with a large amount of unlabeled data.
What are the design considerations of an anomaly detection system?
There are a few design considerations that have to be taken into account before designing an automated and adaptive anomaly detection system. These principles are:
- Timeliness: Timeliness is a measure of how quickly the business needs to find the anomalies and whether the anomaly detection is on past data or real-time data.
- Scale: Scale consists of the number of metrics, data volume, and complexity considerations when doing anomaly detection.
- Rate of Change: Rate of change is the change in the data patterns, i.e. how often the data patterns change.
- Conciseness: Conciseness is the ability to explain the anomalous behavior of the data. It measures the strength of the system to correlate the anomalies with the causes of the anomalies. The system should handle conciseness for multivariate anomaly detection.
- Definition of Incidents: This is one of the most critical points to consider when we are trying to detect anomalies. The incidents defined as to what is normal vs an anomalous behavior.
Commonly used machine learning algorithms for anomaly detection
There is a separate branch of machine learning which deals with anomaly detection and prediction, and there are specialized machine learning algorithms for this task. Some of the algorithms are:
1. Density-based machine learning algorithms: These are a class of algorithms that focus on the density of the data around a particular data point. A data point with lesser density is flagged as an anomaly or, more formally, an outlier. The algorithms that come in this category are:
1. K – means clustering
2. K – nearest neighbor
2. Support Vector based algorithm: A support vector machine is a useful technique for detecting anomalies. One of the popular variants of SVM is OneClassSVM. An SVM converts data points that belong to normal instances of data into a higher dimensional representation, which can be separated by a single straight hyperplane. Thus, the SVM separates the normal instances of data from abnormal instances by learning a decision boundary that separates the two classes.
3. Classification and Regression Tree-based algorithm: Classification and regression trees are one of the most robust and most effective machine learning techniques. CART can be used both in a supervised and unsupervised manner for anomaly detection. For supervised learning, the algorithm teaches classification decision trees to classify anomalies and non-anomalies, and it requires a training dataset. For unsupervised learning, the algorithm trains regression decision trees that predict the next data point in your series and have some confidence interval or prediction. This approach is favorable if you have training data, but the data is not balanced (Unbalanced data refers to classification problems where we have unequal instances for different classes)
Commonly used deep learning techniques:
1. Recurrent Neural Network: Since most of the anomaly detection problems are dealing with time-series data, the most suitable type of neural network is the Recurrent Neural Network. The variants for RRN’s LSTM and GRU are capable of modeling the most sophisticated dependencies and advanced seasonality dependencies in a sequential data pattern. This approach is also preferable when monitoring more than one series
2. Autoencoder networks: An autoencoder is a type of artificial neural network used to learn efficient data encodings in an unsupervised manner. An autoencoder aims to learn a representation (encoding) for a set of data, typically for dimensionality reduction. Along with the reduction side, a reconstructing side is learned, where the autoencoder tries to generate from the reduced encoding a representation as close as possible to its original input.
Anomaly Detection is of utmost importance for fast-growing digital businesses today. Detecting anomalies is very important and can lead to increased revenue, business opportunities, and customer retention. On the downside, it is quite complex to build an anomaly detection system that is scalable, automated, and adaptive. Here, we have discussed the factors that need consideration while designing a system for anomaly detection and machine learning techniques that can help detect anomalous behavior in data patterns to improve business the decision-making process.
I would like to acknowledge Suraj Tripathi for his contribution to the article. Suraj is a Data Science and Machine Learning enthusiast, with the experience of working in Natural Language Processing, Time Series Analysis, and Computer Vision.