Top Banner

Online Advertising Fraud and Traffic of Bad Intent: How We Got Here And How We’ll Get Out

by on January 29, 2014

Almost exactly four months ago I spoke with Jeff Fraser from Marketing Magazine about the prevalence of fraudulent traffic on the online advertising exchanges. Jeff’s article, “It’s 2AM, do you know where your ads are?” shed light on an issue of much concern to advertisers already spending, or thinking about spending money in the online advertising ecosystem, across all channels – display, video, mobile. The fact is, fraudulent and blatantly unqualified traffic has existed for longer than the ad exchanges have been around; yet the proliferation of indirect media buying has brought it to the forefront for online advertisers. As 2014 rolls in, there’s no better time to look back and consider how we’ve let it come to this, and what we can do about it moving forward.

What Is Fraudulent Ad Traffic?

[Host] On one hand, you’re saying the United States government is spending millions of dollars to eliminate the flow of drugs onto our streets. At the same time, we are doing business with the very same government that is flooding our streets with cocaine.

[Guest] Mmmm-hmmm, si, si. Let me show you a few other characters that are involved in this tragic comedy.

[Source: Scarface]

Before I get into fraud in online advertising, it’s worth familiarizing oneself with a few of the “bad traffic” generators of the internet. You can think of them as the building blocks of fraudulent internet traffic, in particular in their ability to affect the flow of advertising dollars in the online ecosystem.

  • Pay-to-click/act (P2C) networks. You don’t need to look far to find these bad guys. Their premise is: on one end they take in dollars from “advertisers” (typically, representatives or arbiters multiple degrees removed from the actual brand), and on the other they pay dollars out to participants willing to browse specific web sites and click on ads, or perform actions such as signing up for an online offer, promo, credit card, etc). Examples include,, – but there are many more. And a lot of these, like, act as re-sellers of other P2C nets. In other words, they feed off of one another.

P2C Network - Source:

  • Bot networks. These come in a variety of shapes and sizes, though they are all predominantly characterized by the fact that the generated traffic is not human (no real eyeballs, however unqualified), though it is of course human initiated. Bot traffic is sometimes identifiable by a small number of connecting IP addresses, for a large number of impressions, or by suspicious distributions of a known variable, such as for example a suspiciously large proportion of MacOS/Safari browsers accepting third-party [ad server] HTTP cookies, when it is known that these browsers reject third-party cookies by default. The fact that in the programmatic and mobile advertising ecosystems it is not uncommon to see impression traffic originating from mediation, bid, or proxy servers adds an additional complexity to detecting illegitimate bot traffic (e.g., where the connection to retrieve a banner ad does not originate from the end-user device or browser directly, but from another server acting as a proxy, where not all of the connecting host’s request information may be available). Though somewhat more elusive than pay-to-click networks, bot networks are available for rental if you look hard enough, often with pricing that makes malicious arbitrage feasible.

Bot Network Illustration - Source: IPA/Japan ISEC
Earlier this year, AdExchanger featured a writeup from’s Douglas De Jager, a comprehensive primer to illegal bot nets.

  • Bad arbiters. It’s all about indirection. These arbiters – usually networks acting as sellers – act as the gatekeepers of ad dollars, by plugging in supply to demand, sometimes fraudulent supply. I bin these into one of two groups: those acting maliciously, and those getting duped, sometimes knowingly, but often due to their complacency. I want to focus on the ones getting duped – the ad networks out there that, sometimes due to their own ignorance, and other times because they make it easy to get duped (e.g., easy sign up for web sites and publishers, focused almost exclusively on volume, scale, and the wrong metrics, etc. – more on this later), play a key role in bringing legitimate advertiser dollars to the doorstep of bot and pay-to-click network operators.

Next we’ll look at how the above described entities plug into one another and can wreck havoc on media dollars in the world of online advertising.

How Does Fraudulent Internet Traffic Make Its Way To Advertisers?

What’s particularly interesting is that in the online advertising ecosystem, some of the so-called “bad arbiters” are purportedly ignorant of the fact that they’re playing a key role in poisoning the well. Ironically these are often the problem, as they bridge that last mile between the legitimate and illegitimate, covertly linking brands, often very reputable brands, with the bad actors I described earlier.

How are advertisers getting duped? Cui bono. Figure 1 below shows how the well is poisoned with fake web traffic. Figure 2 depicts the flow of online media dollars and, in particular, how legitimate brands are connected to illegitimate sources of traffic thanks to the indirection inherent in our ecosystem.

How fraudulent traffic supply poisons the well
Figure 1: Good and bad intent flows through impressions, our familiar old media currency
How brand dollars move online
Figure 2: How dollars follow impressions

The prevalence of altogether fake web sites in Figure 1 is of particular significance in the evolving real-time bidding ad exchange environment: for primarily technical reasons, site URLs are declared on the exchanges by sellers – and the sellers hold the seats on the exchanges. Some sellers are getting duped by bad actors, and since they are often compensated in terms of volume, are not particularly compelled to change the situation.

Almost a year ago, John Battelle wrote about research he was doing about bad supply actors in the online ad ecosystem funnelling in traffic from fake sites, and echoed publisher frustrations about falling CPMs being due to supposedly “infinite supplies of impression inventory,” a problem exasperated by the prevalence of altogether fake sites, on-boarded onto ad exchanges via sellers who are themselves being gamed by participating publishers. Andrew Casale wrote in The Make Good about a couple of examples Casale discovered when vetting sellers on their platform. This is good to see, that some ad technology vendors are placing more emphasis when on boarding publishers, for fear of false declarations or altogether manufactured web sites. The IAB, under Battelle’s leadership, has recognized the importance and scale of the issue and established a task force to help deal with it. Sadly some are ignoring the problem, or reacting slower than others.

Unfortunately, most of our media performance metrics haven’t caught up to this reality yet, which brings us to the next section.

Why We’ve Been Making It Easy

The core metrics buyers tend to focus on when adjusting how they bid on media include impressions, clicks, and post-impression or post-click actions occurring on an advertiser’s web site. In turn, dollars are baked in by calculating costs per thousand impressions (CPM), per click (CPC), and per attributed action (CPA). The volume of any of impressions, clicks, and attributed actions associated with a unit of dollars spent is often interpreted as a measure of a campaign’s success. The assumption is that the metric under consideration is either a direct measure, or a sufficiently accurate proxy measure of the desired outcome.

For advertisers selling a product on the internet, the desired outcome is usually the sale itself. For brands looking to increase awareness, impress, or associate themselves with other brands, the desired outcome is arguably even further removed from the media metrics. In both cases, malicious actors know all this, and act in the interest of the metrics. In other words, they tend to do pretty well on the CPM, CPC, and CPA front.

Indeed, our media and performance metrics are not only susceptible to gaming by fraudsters, but also fail to capture spot viewability. In a response to Jeff Fraser’s Marketing Magazine article, Mark Sherman, CEO of Media Experts points out:

So let’s not throw the baby out with the bath water. The foundation of all of our media metrics is very shaky. Waste is everywhere, some of what we call waste is not even there, it’s simply numbers and not actual impressions, not actual ad delivery, just an opportunity for it, a big maybe… a Cost Per Maybe.

A media currency metric that bakes in historical ad spot viewability (call it CPvM, or Cost Per viewable Mile) would go a long way to eliminating waste, particularly if you can act on the result. Google’s Neil Mohan wrote about this not long ago. If in 2013 we’ve seen ad viewability after-the-fact reporting confirm the same sad truth for everyone (i.e., that 50% of campaign impressions are never visible on a user’s screen – even for legitimate sources of web traffic), then 2014 will have to be the year we can make programmatic purchasing decisions accordingly and act on ad spot viewability data before-the-fact, at decisioning time, when the bids are made and the money spent. So when we consider the weaknesses of our media metrics and look to refine them, it’s not just about fraud.

Sherman finally points to Say Media CEO Matt Sanchez’ article on MyersBizNet calling for an IAB domain registry to fight one type of traffic fraud. Personally, I’m not a big fan of centralized registries (all too often you end up having to pay to join), but the idea of ranking web sites sourcing ad traffic on the basis of its observed traffic patterns and mined domain data is a good one. This is exactly why earlier this year at AdGear we started cataloguing web sites and augmenting them with third-party domain data and in November 2013 launched AdGear Site Rank, a scoring of site domains accessible via sellers listed on the online advertising exchanges that our Trader product sits and executes buys on. The purpose of AdGear Site Rank is to score site domain traffic based on a comparison of observed historical traffic patterns across multiple ad exchanges and sellers to third-party domain data reflective of the site’s expected true traffic patterns. A discrepancy between observed and expected traffic is reflected in a web site’s calculated score.

Though developing new targetable and reportable metrics to better qualify web sites in terms of legitimacy is a first step in the right direction, we recognize that it’s by no means a silver bullet. In fact, it’s not one new metric that will solve the fraud and quality problem plaguing the ecosystem. It’s not one site registry. Any single quantifiable metric or outcome is subject to gaming, eventually.

In the latest December 2013 print edition of Wired Magazine, in an article entitled “Numbed by Numbers: Why Quants Don’t Know Everything” (sadly I don’t have a permalink), Felix Salmon makes an interesting point about quants and data-driven decisioning in general: that regardless of discipline or industry, the rise of data-driven decisioning tends to happen in four stages: pre-disruption, disruption, overshoot, and synthesis. To explain pre-disruption, Felix points to decisions taking place in Hollywood which, though ripe with data and the potential for quantification, are governed by a relatively small group of studio executives (I’d suggest that this example is disrupted with the advent of Netflix).

Disruption is kicked off by early adopters and eventually leads to a form of mass adoption. I’d posit that online advertising’s exchange and real-time bidding ecosystem has effectively passed the disruption phase, even here in Canada. That is not to say that all brands and advertisers are on-board buying ad impressions programatically, just that they no longer suffer from accessibility challenges if they choose to do so – it’s no longer just for the big or the few.

Then comes overshoot. In his description, Felix points to Campbell’s law:

The more any quantitative social indicator (or even some qualitative indicator) is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.

The analogy in the online advertising space is pretty telling, I find: ad networks pressured to deliver in terms of a single set of performance indicators, like CPA, CPC, CPM, become inevitably obsessed about measuring themselves in their ability to deliver their desired outcome almost entirely in terms of these same metrics. They obsess about delivering the desired impression, click, or conversion volumes, at the target CPM, CPC, or CPA – they negotiate deals or issue out bids on the basis of these.

And the fact is, if you’re just looking at the numbers, you will get fooled by the bad actors. In fact, if you’re just looking at the numbers, it’ll be appealing to harbour the bad traffic supply because the bad traffic sources are designed to drive up the numbers and make you look good, to make it seem like you’re reaching your desired outcome.

Where Do We Go From Here?

As ad technology platforms, and as an industry, we need to move toward synthesis. Don’t get me wrong, new metrics like Site Rank that introduce new ways to valuate inventory, modelled on concepts of legitimacy, are very much a part of that. But beyond just numbers, we need to consider what it is we lose by just looking at the numbers when buying media online.

When it comes to online advertising, I believe in the world of many networks. Call me old fashioned, but I believe that media, local media in particular, is sold. In a world of many sellers, enabling the sellers to on-board inventory into the buyer ecosystem and build up a reputation, a specialty, a trust in a brand’s ability to reach a particular audience in a particular environment. There is a place for ad networks in the ecosystem, as long as they provide real value:

  • Exclusive and vetted inventory
  • Special access to vetted inventory
  • Expertise and service
  • Transparency in pricing and delivery
  • Consumer trust (concern for consumer privacy and industry standards)
  • Access to technology (e.g., unique decisioning ability)

For networks acting as buyers, the challenges are similar. The fact that today’s programmatic ad landscape does not emphasize the relationship between buyers and sellers first, and is primarily numbers driven is backwards. Perhaps it is a consequence of hyper-niched ad technology vendors who, except for a handful of exceptions, have either focused on the demand or the supply side of the equation, rather than the intersection of the two.

In an ideal world, aggregate buyers and aggregate sellers build relationships with one another, understand each other’s actual challenges, and align each other’s real interests. The deals being formed in such a world lower cost of sales, maximize efficiency (yes, in part thanks to meaningful metrics), but are delivered in a vetted, verified ad environment because in such relationships it is in neither party’s interest to ruin things. The cost of getting caught, even once, is too high. On the other hand, in a world where negotiations are all about volume and CPA, the true value legitimate sellers provide is obliterated by the requirement to deliver on the numbers alongside a vast array of other supply providers. We thus forget what’s truly important: real exposure to real, qualified audiences.

In 2014 and beyond, as designers and operators of programmatic advertising technology, I believe that the opportunity will be in enabling the ideal world relationships between buyers and sellers; to do so transparently through technology and services with transparent pricing, to give ad networks of sellers and buyers the tools to better connect and foster unique relationships that set them apart.

Ultimately, to change the way in which we look at, discover, browse, buy, and sell media on the internet.

Bosko Milekic


Co-founder, VP Technology

Leave a Reply