#21 - Are you mislabeling credit losses as fraud?

Apr 5

Written By Chen Zamir

If you have ever managed non-card fraud, you know the trickiest side of the business:

Separating losses between fraud and credit.

This is the no.1 challenge for many business models: from loans and cash advances, to ACH and BNPL payments.

Here’s the bad news: I don’t have a magic solution for it. I suspect no one has.

But the good news is that there are some principles you can follow that would help you to gain higher accuracy.

And so, let’s talk about it.

But first:

Why is separating fraud from credit losses so important?

The foundation of any fraud prevention system is accurate loss labeling.

It doesn’t matter if your system supports cutting-edge AI applications or a couple of off-shore agents that do manual review.

Ultimately, you want to make accurate decisions to optimize your business’ performance.

Accurate decisions require data. And if your data is “dirty”, so are your decisions.

Now, you might ask: “Why does it matter? If I train my model on losses, it would learn to predict and mitigate them.”

Theoretically, yes.

Realistically, though, the main performance driver for risk models is the enrichment features you create and add to your raw data.

As 3rd-party fraud patterns are different from credit default patterns, so are the features that help identify them.

If you try to identify two different patterns with two different feature sets, the algorithm will need to “weigh” them differently.

At this point you realize you need two separate models, which will require two separate label sets... aaaaaand we’re back to square one.

OK then, what can we do in order to improve our labeling accuracy?

Let’s consider three core principles:

Clean history

If this is not the first time you see the customer, and they didn’t create losses in the past, it’s very likely they didn’t turn to fraud all of a sudden.

However, many times I see teams take the opposite approach: suspecting the last event to be fraud, and re-labeling all past events as unreported fraud.

There are of course such cases, but they are marginal in the grand scheme of things.

Also, don’t consider only the customer’s account activity: did you see them before as guest users? Do they have older, inactive accounts?

Even referring to the reputation of known family members (i.e., linked by lastName+billAddress) can help you infer this is a non-fraud event.

Think about it: if your wife has been using the product for years, what are the chances you signed up just to commit fraud?

Distance from known fraud patterns

Let’s turn the previous principle on its head - do we see resemblance between this case and the behavior of known fraud rings?

This is trickier to judge, and will likely require manual effort.

But keep in mind: fraud is never a solitary event. Especially when it’s successful.

If you can’t find an associated ongoing attack, chances are this isn’t a fraud case.

The way to do that at scale – and, as usual, with less accuracy – is to look at the fraud score. If it’s extremely low, you may want to review this label.

Side note: Using low-scores to judge label accuracy can easily become a double-edged sword. Use this with caution, as a crude scaling solution, and likely not by itself alone. There’s a real danger of introducing recursiveness into your decision processes, and this is exactly what we want to avoid.

Another thing to consider is presence/lack of velocity indicators, especially if these are common in your fraud patterns.

Whatever heuristic you employ, the basic principle is the same: showing a “behavioral distance” between this case and known fraud cases.

Side note: For more advanced implementation, cluster your users/payments offline and measure the distance between them and known fraud cases. It will require investment, but would be far more accurate than using your “normal” score.

Lack/presence of intent

Committing fraud is an action that requires intent.

Ignoring weird edge-cases, you simply cannot commit fraud without knowingly choosing to do so.

And having intent, changes how you behave.

It changes your user journey. It changes your behavioral telemetry. It changes your speed.

All of these things are detectable, even if albeit require you to be extra diligent in collecting user data.

Side note: An important exception is 1st-party fraud, where customers will knowingly default. While you might think that this requires intent as well, it can also be an opportunistic decision that happens after the fact. In any case, one might argue whether 1st-party fraud shouldn’t be considered as credit loss...

Despite it all, accuracy is not guaranteed.

We just discussed a lot of ideas, but we need to remember our initial statement:

There are no magic solutions here.

Each of these principles can be implemented in several different ways, depending on your business and product.

Naturally, these implementations would each have a different degree of accuracy. And I can only promise you one thing: none will have 100% accuracy.

So was it all a waste of words and time?

No. Here’s how to make use of it:

Create a labels tree.

Here’s an example:

Each of the labels we see above should be represented by its own datapoint in your data schema.

This way, you can run tests (and train models) on label-sets that are much more granular, and will allow you to experiment and find out the combination that works best for you.

Of course, don’t forget to include in it also your labels base-line: whatever you get from your collection agency, customer support team, etc.

The Bottom Line

Trying to be 100% accurate in separating fraud from credit losses is bound to fail.

Instead, do this:

Create your own labeling heuristics
Implement them as a labels tree
Experiment with your tree to gain optimal performance

The nice thing about it, is that you don’t need to care about what is the actual accuracy of each of your heuristics.

What doesn’t work gets thrown out.

Have questions or feedback? Reply to this email, I read all messages.

In the meantime, that’s all for this week.

See you next Saturday.

P.S. If you feel like you're running out of time and need some expert advice with getting your fraud strategy on track, here's how I can help you:

Free Discovery Call - Unsure where to start or have a specific need? Schedule a 15-min call with me to assess if and how I can be of value.
Schedule a Discovery Call Now »

Consultation Call - Need expert advice on fraud? Meet with me for a 1-hour consultation call to gain the clarity you need. Guaranteed.
Book a Consultation Call Now »

Fraud Strategy Action Plan - Is your Fintech struggling with balancing fraud prevention and growth? Are you thinking about adding new fraud vendors or even offering your own fraud product? Sign up for this 2-week program to get your tailored, high-ROI fraud strategy action plan so that you know exactly what to do next.
Sign-up Now »

Enjoyed this and want to read more? Sign up to my newsletter to get fresh, practical insights weekly!

Chen Zamir