Should you switch your fraud detection algorithm?

The least important question we hear most often: “Which algorithm should I use for fraud detection?”

Truth is, algorithms usually offer only a small performance contribution. With time we’ve learned that excellent ML models rely much more on other factors that usually don’t get that much attention. So let’s shine a light on those, by order of importance:

Labeling - fraud is mostly tackled with supervised learning techniques and these require accurate labeling. This might seem trivial when thinking about stolen identities and chargebacks. Often though, acquiring accurate fraud labels can be very hard. They might be mixed in with credit losses or with 1st party fraud.  Alternatively, even chargebacks can end up unreported if they were covered by 3DS or the issuer had another reason to take liability for the loss.

Whatever the cause is, if we end up with inaccurate labels, we’re set to fail.

Data Enrichment - enriching “raw data” with 3rd party data as well as in-house features plays a major role in ML performance. Models get a huge boost when the data they consume is tailored to describe buying behaviors and fraud patterns rather than the raw data itself.

Score Strategy - we’ve talked about this many times before. My model can have a great AUC, but if we don’t know where and when to act on it, we won’t be able to leverage it.

Algorithm Config - now, after we’ve optimized all of the other layers, it’s finally time to look at the algorithm itself. However, before we look into different algorithms altogether, it’s much more important to optimize the current algorithm itself. The first step would obviously be to optimize the hyper-parameters, but there are other areas to consider as well, like feature selection.

We’ve actually seen plenty of examples when optimizing the configuration led to the same performance boost seen with more “sophisticated” algorithms.

Dataset Size - finally, we’ve learned that the least important thing for ML fraud detection is the size of the dataset we have to work with. This might seem counter-intuitive as we keep hearing that “data is the new oil”. However, if we’ve done all of the above correctly, I can work with minimal datasets. And If we haven’t, even a huge dataset would not help much.

So instead of asking which algorithm to use, ask instead about the setup that is already in place.

Next
Next

Why Fintechs expanding to the EU struggle?