Using Predictive Analytics for Ecommerce Fraud Detection

Payment fraud is increasing, especially in ecommerce, where losses to online payment fraud were forecast to exceed US$48 billion globally in 2023. Clearly, online businesses need stronger defenses to protect their revenue. Predictive analytics may be one of the most effective ways an ecommerce business can detect fraud. By analyzing past data, trends and variables, businesses can build smarter fraud score algorithms and model more accurate fraud prevention strategies.

ClearSale Vice President Rafael Lourenco walks through a six-step process for using predictive analytics to stop ecommerce fraud in its tracks.

To use predictive analytics for ecommerce fraud detection:

  1. Define your historical database of online orders.
  2. Define which transactions to include in your order database.
  3. Analyze the fraud patterns and variables.
  4. Model the collected data.
  5. Implement the model online.
  6. Monitor your results.

Let’s explore these in greater detail.

1. Define Your Historical Database

As with any other analytical process, Lourenco says, you have to start with a historical database of online order details. When building this database, think carefully about the timeframe for orders. Will your database include six months’ worth of orders? A year of orders? Lourenco encourages using at least one year of data — if available — to build a holistic picture of any sales seasonalities.

On the other hand, relying only on recent orders can be a problem. The reason? Chargebacks and fraud can take weeks — even months — to show up on a balance sheet. This means trying to predict future sales based on yesterday’s orders can be risky, if not impossible.

Instead, he recommends building in a safe window of at least 90 days. For instance, if it happens to be April when you begin building your database, consider including orders that were made between January and December of the previous year, and eliminate orders from January to March of the current year, as the full picture of chargebacks and fraud from this time period has yet to develop.

At ClearSale, our database of order details reaches back to the first ecommerce orders placed and the earliest days of fraud prevention. As a result, our historical database provides one of the broadest pictures of fraud and fraud trends available from around the world. 

New call-to-action

2. Define Which Transactions to Include in Your Order Database

Once you’ve established the timeframe for your analytics, it’s time to define the “what”: What transactions do you include in this historical database?

Chargebacks happen only with certain payment methods, like credit cards. Therefore, don’t include payments made via ACH or any other method that doesn’t create a chargeback.

In fact, adding these payments to your database would likely end up creating bias in your analytics. If ACH is more common in Region A than it is in Region B and you’re including ACH payments in your database, then it will seem like Region A is a safer place in which to do business. But in actuality, Region A’s decreased risk is only because ACH doesn’t result in chargebacks and is at a lesser risk for fraud. So, when you’re trying to predict chargebacks and fraud, look only at the payment methods that are contributing most to that risk.

After you define the “what,” you must be able to translate that into a binary variable, such as whether orders made in a certain time period and processed via a certain payment method can ultimately be identified as good or bad orders. In the simplest of terms, every order your system approves and doesn’t come back as a chargeback should be viewed as a good order. Every order that’s approved and comes back as a chargeback should be viewed as a bad order.

But here’s the catch: You can’t use chargebacks as the only definition of bad orders. Say your fraud filters avoid every order placed from or delivered to Russia. That means you’ll never experience any chargebacks there, right? Sounds like good news. However, not every order coming from or going to Russia is fraudulent. Therefore, your strategy means you’re declining at least some legitimate orders— and that affects your ability to identify future patterns and variables associated with both good and bad orders.

On the other hand, if you don’t automatically block all orders to and from Russia, you’ll be able to better understand when an order related to Russia is safe and when it’s not. Yes, your all-or-nothing strategy means you’ll never experience chargebacks on these orders, but you’ll also miss out on countless legitimate sales. Even worse, it makes your fraud database look like there’s no fraud in Russia — and we all know that fraud is everywhere.

But that’s not the only reason automatically declining orders is damaging to your business. When you decline orders, you end up with what we call “censored data” — or data (in this case, orders) you’ll never truly be able to objectively evaluate. In the case of the declined Russian orders, we’ll never know which ones were actually fraudulent.

To avoid this, you need to clearly define what a good order is and what a bad one is. Lourenco suggests starting by identifying both chargebacks and confirmed fraud orders that you either avoided, declined or blocked. Then define good orders as the transactions that were approved and never came back as chargebacks.

To help determine which orders should be automatically reviewed, ClearSale recommends a calculation that multiplies fraud probability (a score assigned to order values based on past experience) times the total order value to get the expected loss. 

If the cost per order for a secondary review is less than the expected loss, you may want to manually review orders of that value. You’ll want to establish automatic approval cutoff points by making the expected loss calculations for a range of total order values ($100–$999, $1,000–$9,999, etc.)

Plot the expected loss calculations for each order value on a graph, and you’ve created your approval cutoff point. Any order that falls below the curve should be automatically processed – and any order that is above the curve should undergo a secondary review.

3. Analyze the Fraud Patterns and Variables

Now that you’ve set up your database and defined what good and bad orders are, it’s time to analyze. Through descriptive analysis, try to understand the patterns and variables that help with fraud detection and differentiate the good orders from the bad.

As you analyze orders, create some variables and see if they affect the percentage of bad orders. For instance, you might build a weekday variable, so you can see if Mondays have higher or lower fraud rates than Tuesdays. If you find that all weekdays have the same fraud levels, then you know that’s not a worthwhile variable to analyze. On the other hand, if the weekdays each have very different fraud rates, it’s a great variable to include in your analysis.

While Lourenco encourages creativity while creating variables, he emphasizes creating only the variables that can be analyzed the moment an order is placed. After all, you want to know right away — as soon as someone places an order — what the probability is that the order is fraudulent. For example, you might want to know how many orders a particular phone number has placed before, so you analyze all past transactions and evaluate the likelihood the order is a bad one.


Choosing the Right Fraud Prevention Tool

4. Model the Collected Data

The fourth step is the modeling itself. Here, you’ll most likely need the help of data scientists, as the techniques for modeling range from old-school styles like logistics regression, neural networks and deep learning to more modern, complex approaches.

With a data analysis technique called ensemble modeling, businesses and analysts can combine the results from several fraud score programs to deliver one score. That score is very precise, because it’s based on the strengths of all the programs that contribute data. ClearSale uses ensemble modeling because it has the capability to improve fraud detection worldwide — and to continue improving as businesses come up with new and better ways to score their orders.

Regardless of the technique used, your objective here is to translate all the data you’ve gathered into a predictive probability of the likelihood of fraud.

5. Implement the Model Online

When you move into the implementation step of fraud detection, you’ll want to be able to assign a referral score to every credit card order that’s placed on your website to predict the likelihood of that order being a fraudulent one.

Note: It’s often best to use the same resource you used in the modeling step to help you implement the model into your decision flow.

6. Monitor Your Results

Last, but certainly not least, is monitoring. Is everything working in the real world like you expect it to? You might think your purchases are equally spaced between weekdays and weekends, so you’ve trained your model in a database in which half the orders were placed on weekdays and the other half are placed on weekends. But when you implement your model and follow up on it in real life, you learn that only 10% of orders are placed during weekends. If this should happen, go back and tweak your model until it’s a good fit with the way your online company actually operates — and even then, you’ll need to continue monitoring so you can adapt as your business evolves.

If you’re looking for a trusted partner in fraud detection to help you monitor the constantly changing fraud landscape, you want a partner that is trusted by clients to protect their bottom line. ClearSale has been consistently ranked #1 for Ecommerce Fraud Protection by satisfied customers on G2.

Our analysts use a unique combination of human analysis and robust artificial intelligence to identify emerging fraud patterns and stop chargebacks and false declines from taking a toll on your revenues. Contact us to learn how we can make your fraud prevention strategy more effective.


Ecommerce Fraud Protection for Online Businesses: The Ultimate Guide