Frequently Asked Questions

What do I have to do to integrate you and begin seeing the insights in your system?

Deepchecks can integrate with common databases (For example: S3, HDFS, BigQuery, Snowflake) for batch processing. We also have a python package that can integrate with real-time use cases with a few lines of code.

For more information, click here.

I want to use your system! What do you need from me / What do I need to do?

For a basic monitoring solution all we need is the model’s input. In order to receive more informative insights, we recommend to monitor multiple data points across the data pipeline (e.g. raw features, predictions, labels, training data).

Which types of issues does your system detect?

There are a few types of issues we detect:

  • Data integrity issues (e.g. dirty data, data schemes mismatch)
  • Model confidence related issues (e.g. weak & strong segments, out of distribution samples)
  • Statistically inspired issues (e.g. data drift, bias)

We also enable configuration of custom alerts, since you know your data better than us 🙂

Why shouldn’t I just use the monitoring that already exists in the cloud platforms?

Some platforms began to develop basic monitoring solutions, although they are far from providing an adequate solution to this critical problem.

From our experience, most of the problems that come up in production can only be detected with a comprehensive monitoring solution, that takes the model’s properties into account and compares different phases across the data pipeline.

Which types of models do you support?

We support any models adhering to the scikit-learn model api. This includes: TensorFlow, CatBoost, LightGBM, Keras, XGBoost, Caffe and scikit-learn models (as well as many others).

In addition we also support multi phase models, ensembles and models combined with business logic.

I have a complex pipeline, which consists of an ensemble and different types of rules which originate in business logic. Does your system support this?

Yes!

My ML pipeline runs in multiple locations across the organization’s infrastructure. The raw data, pre-processing python code, and model inference are all in different places. Where should I place Deepchecks in order to get valuable insights?

Deepchecks is capable of monitoring a single data point or multiple data points across a pipeline. From our experience, many valuable insights are only detected while monitoring multiple data points (this is necessary for both inter and intra pipeline monitoring).

When I consider deploying a new version of my model into production, I typically run an A/B test with the current version. Can Deepchecks help me with this phase, or is the system only suited for monitoring one deployed version?

Yes! Our robust architecture enables comparing different pipelines or live versions with each other. One application of this capability is monitoring A/B tests via our system, although there are various other applications.

I have sensitive data that I wouldn’t like to expose. How does Deepchecks deal with this?

There are two solutions for this important (and common!) issue:
Deepchecks offers both a SaaS solution designed for anonymized data, and an on-prem deployment option for non-anonymized data (which even works on air-gapped environments).

I already feel like I have too many dashboards. Can’t you just send me your metrics and the alerts so I can read them using a generic monitoring system?

Yes! Apart from our own dashboard, our metrics and alerts can be sent to common monitoring and alerting tools, including Slack, Mail, Teams, Datadog, New Relic, Graphite, PagerDuty, Splunk, ServiceNow and more!

I have large volumes of data being streamed to my ML model. Can Deepchecks handle this?

Yes, Deepchecks was built to support streaming data pipelines of extremely large volumes.
Please reach out to receive specifications and benchmark results.

Have a question that we didn’t answer?

Submit

Thank you!

Just a few more details before we answer your question.