Requiring thousands of computations to be performed in milliseconds, fraud detection can be extremely complex. There is a need to continuously improve the model’s ability to distinguish between normal and abnormal behavior using both supervised and unsupervised ML techniques. The problems are typically approached by anomaly detection techniques (e.g. Isolation Forest combined with business logic), classification algorithms (e.g. XGBoost after artificially balancing the classes), or a combination of the two.
In many cases, the ML pipeline depends on a range of data sources with varying formats, when some are owned by third parties and may be changed without notice. Dirty data may go undetected since preprocessing methods are used to fix many of the problems—such as inconsistencies, noise, or missing values. Moreover, new types of fraud periodically emerge that weren’t represented in the training data during the last time the model was trained. These different issues are hard to detect in any ML system, but the challenge intensifies when the class imbalance is extreme when false-negative predictions go undetected, or when the pipeline includes a component of unsupervised learning.
Whether your fraud detection system is based on supervised, unsupervised, semi-supervised machine learning, Deepchecks can be used to ensure the end-to-end quality of your ML pipeline, as well as enable you to deal with new patterns of fraudulent behavior. Deepchecks connects to the different components of your training and production pipelines (raw features, processed features, predictions, labels), and learns their behavior over time. It then enables you to:
If you’re dealing with transactional data, and are dealing with issues such as class imbalance, undetected false-negative samples, or components of the system which are unsupervised – Deepchecks is the best choice for ensuring your machine learning system’s quality.