Checks Lifecycle: Deep Dive
In this section we will explain how a background check unfolds over time, detailing which circumstances affect the check’s execution.
This is a very complex section, where we use terms that we have defined before. We recommend reading the Understanding Background Checks and the Datasets, Scores and Types sections before this one.
Check Statuses: Overview
All checks have a status that indicates in which part of the lifecycle they are. We will explain each part one by one (with the corresponding status) and then do a recap at the end of this section.
Queues and Priorities
First of all, when a check is created it can be enqueued and remain enqueued for a certain period of time. While the check is enqueued, its status will be not_started
. This queue system allows us to monitor in real time how our data sources are fairing with volumes, thus preventing us from exceeding their capacity.
The way we determine how a check should be handled by the queue system, is by the check’s priority. More specifically, the priority decides whether the check is enqueued or not and how much time will it remain enqueued.
There are 3 priorities:
High
: Is the fastest way that a check can be executed. It will skip any queues and it is only limited by how many other high-priority checks are currently running.Medium
: Is the default priority if not specified. This causes checks to be enqueued. Also, there is not a limit on how many checks can be enqueued at any given time.Low
: These checks are similar tomedium
, but they are enqueued in a lower priority queue. This means they will remain more time enqueued than those created with themedium
priority.
Here is a diagram detailing the information explained above:
Check execution and data sources
When a check starts, we begin the data collection of the data sources specified by the datasets present on the check type. At this point, the check changes its status to in_progress
.
The check will remain in_progress
as long as there is at least one database collecting results. When the last database yields results, the check will change to status completed
.
Note that data sources also have statuses, so if you only need information from specific data sources, you should watch for their status changing to
completed
.
This diagram depicts this process:
Things can go wrong
Unfortunately, a check’s processing isn’t always as smooth as shown above. This is due to the volatile nature of the public data sources we query data from. For example, some data sources could be down or slow at the moment when the check is started. It is worth noting that Truora is not in control of the data sources we query, so any setback with any data source is out of our reach.
When a data source is down, we will retry to collect the involved data source up to a limit. If this retry limit is reached, we will mark the status of that data source as error
. Besides this, when a check finishes and more than 30% of the data sources end up in error
status, the general check status will change to error
. Checks that end in error
status will not be charged.
Similarly, when a database is slow, its status will change to delayed
. Depending on the priority of the check, there is a time frame where if all the data sources have not finished yet, the general check status will change to delayed
.
This time frame duration is as follows, and starts at the moment the check begins collecting data:
Priority | Duration before delayed status |
---|---|
high |
2 minutes |
medium |
8 minutes |
low |
8 minutes |
The delayed
check status means that the data sources that are slow probably won’t be yielding any results. In these cases, instead of waiting for the check to complete, it might be appropriate to make a decision with the information that the check currently has, or creating a check later if the information from that particular data source is crucial to you.
When a check enters the delayed
status, depending on the country, it can take up to 3 days to finish. When this timeout occurs, the check will be forced to finish and we will count delayed databases as errors and set the final score accordingly.
The worst case scenario, where one data source never yields any information, is described by the following diagram:
The complete lifecycle
At last, here is a diagram detailing the whole check lifecycle:
In addition, the next diagram represents all the possible statuses on a check and how they transition to one another:
To sum up the whole lifecycle, these are some brief descriptions of the check and data source statuses:
Check Statuses
Status | Description |
---|---|
not_started |
The check is enqueued and the data collection has not started yet. |
in_progress |
Data is being collected but some data sources may have finished already. |
delayed |
One or more data sources are taking a long time to query the data. Most data sources will have already finished. |
completed |
The check finished and 70% or more of the data sources did not end in error status. |
error |
The check finished and more than 30% of the data sources ended in error status. |
Data source statuses
Status | Description |
---|---|
not_started |
The data source data collection was triggered. |
skipped |
The data source does not fetch any data as it does not have the required inputs to do so. |
delayed |
The data source is taking a long time to query the data. |
completed |
The data source data was fetched successfully and it’s present on the check details. |
error |
We could not fetch data from that data source. |