According to The Financial Cost of Fraud Report [1] fraud costs businesses and individuals around the world more than US$5 trillion every year. And while fraudulent transactions represent only a small percentage of any organization’s revenue, they can still run into the millions of dollars and carry regulatory implications. Over the last decade, more and more organizations are moving from manually written, rule-based systems to AI fraud detection systems that utilize machine learning.
use case

Common Approaches

Requiring thousands of computations to be performed in milliseconds, fraud detection can be extremely complex. There is a need to continuously improve the model’s ability to distinguish between normal and abnormal behavior using both supervised and unsupervised ML techniques. The problems are typically approached by anomaly detection techniques (e.g. Isolation Forest combined with business logic), classification algorithms (e.g. XGBoost after artificially balancing the classes), or a combination of the two.

use case

The Challenge

In many cases, the ML pipeline depends on a range of data sources with varying formats, when some are owned by third parties and may be changed without notice. Dirty data may go undetected since preprocessing methods are used to fix many of the problems—such as inconsistencies, noise, or missing values. Moreover, new types of fraud periodically emerge that weren’t represented in the training data during the last time the model was trained. These different issues are hard to detect in any ML system, but the challenge intensifies when the class imbalance is extreme when false-negative predictions go undetected, or when the pipeline includes a component of unsupervised learning.

How Deepchecks Can Help

Whether your fraud detection system is based on supervised, unsupervised, semi-supervised machine learning, Deepchecks can be used to ensure the end-to-end quality of your ML pipeline, as well as enable you to deal with new patterns of fraudulent behavior. Deepchecks connects to the different components of your training and production pipelines (raw features, processed features, predictions, labels), and learns their behavior over time. It then enables you to:


If you’re dealing with transactional data, and are dealing with issues such as class imbalance, undetected false-negative samples, or components of the system which are unsupervised – Deepchecks is the best choice for ensuring your machine learning system’s quality.

request a demo