We Sent 1,000 Fake Visitors to a Site Running GA4 and Clickport. Here's What Each Tool Counted.
Half of the visitors in your analytics dashboard aren't real people. That's not hyperbole. Imperva's 2025 Bad Bot Report found that automated traffic surpassed human traffic for the first time in 2024: 51% of all web requests are now bots.
The question is whether your analytics tool knows the difference.
I'm David, founder of Clickport. I've spent two years building bot detection into our analytics platform. Not as an afterthought or a checkbox, but as a core system with six detection layers that run on every single event before it reaches the database. When I read Plausible's test of GA4's bot filtering last year, I thought the concept was great but the scale was too small (79 total pageviews) and the methodology left gaps.
So I ran my own test. Bigger, more scenarios, and with full transparency about what we catch and what we don't.
51% of web traffic is now bots
Before the test results, some context on how big this problem actually is.
The Imperva 2025 Bad Bot Report is the most comprehensive annual study on bot traffic, and the 2025 edition paints a stark picture. Bad bots alone account for 37% of all internet traffic, up from 32% in 2023. Good bots (search engine crawlers, monitoring tools) make up another 14%. Human traffic? Just 49%.
Akamai's research puts total bot traffic at 42%, with 65% of those classified as malicious. Cloudflare's 2025 Year in Review found that non-AI bots alone generated 50% of requests to HTML pages, 7% more than human traffic.
AI is accelerating this. DoubleVerify reported an 86% year-over-year increase in general invalid traffic in the second half of 2024, with AI scrapers like GPTBot, ClaudeBot, and AppleBot accounting for 16% of all known-bot impressions. Akamai saw a 300% surge in AI bot activity year-over-year.
The bots themselves are getting better. Barracuda's threat research classified 49% of detected bots as "advanced," designed to mimic human browsing behavior. Simple bots grew from 40% to 45% of all bad bot traffic, driven by AI tools that make bot creation trivially easy.
If your analytics tool can't tell the difference between a bot and a real person, every number in your dashboard is suspect. Your traffic is inflated, your engagement rates are diluted, your A/B tests are contaminated, and your marketing spend decisions are based on fiction.
Our test: 1,000 bot sessions, 5 scenarios
Here's what we set up. A fresh Astro site deployed on Vercel with five pages: a homepage, three content pages, and a contact page. Both GA4 (via gtag.js) and Clickport's tracking script were installed. We disabled Vercel's built-in bot protection so both analytics tools would receive the raw traffic.
The bot traffic was generated using Puppeteer on Node.js 20, with each scenario configured to simulate a different level of bot sophistication. Every bot session included realistic behavior: 50% of sessions scrolled the page, 30% clicked internal links and navigated to a second page, and all sessions included random delays between 2 and 8 seconds between actions.
We ran the tests over five days, March 3 through March 7, 2026. We waited 72 hours after the final test before checking GA4's processed reports (GA4 documents up to 48 hours of processing time).
Five rounds, 200 bot sessions each, 1,000 total. Each round escalated in sophistication.
Round 1: Known bot User-Agent strings
The easiest test first. We configured Puppeteer to rotate through five non-browser User-Agent strings: PostmanRuntime/7.43.4, python-requests/2.31.0, curl/8.4.0, Go-http-client/2.0, and Wget/1.21.4. These are common HTTP library identifiers that no real browser would ever send.
All 200 sessions ran from a residential IP address. The bots loaded pages, scrolled, and clicked links. The only thing unusual about them was the UA string.
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
const botAgents = [
'PostmanRuntime/7.43.4',
'python-requests/2.31.0',
'curl/8.4.0',
'Go-http-client/2.0',
'Wget/1.21.4'
];
await page.setUserAgent(botAgents[Math.floor(Math.random() * botAgents.length)]);
await page.goto('https://bot-test-site.vercel.app/');
GA4 result: 200 sessions counted as real visitors.
Every single one appeared in GA4's Real-time report during the test and persisted in the processed Traffic Acquisition report three days later. GA4's "known bot" filter did nothing.
Clickport result: 0 sessions counted.
All 200 were blocked at ingestion by User-Agent pattern matching. Clickport's bot detection runs a compiled regex with 55+ bot signatures against every incoming request. "PostmanRuntime," "python-requests," "curl," "Go-http-client," and "Wget" are all on the list.
This is the simplest possible bot to detect. A non-browser User-Agent string is the equivalent of a burglar wearing a name tag that says "burglar." GA4 missed all of them.
Round 2: Default headless Chrome
For Round 2, we left Puppeteer in its default configuration. No stealth plugins, no UA spoofing. Default headless Chrome has two telltale signs: the User-Agent string contains "HeadlessChrome," and navigator.webdriver is set to true.
We ran 200 sessions from a residential IP. Same behavioral simulation: page loads, scrolling, clicking, random delays.
GA4 result: 200 sessions counted as real visitors.
GA4 does not check for navigator.webdriver. It does not look for "HeadlessChrome" in the User-Agent. Its gtag.js script fires on page load, sends the event to Google's collection endpoint, and that's it.
Clickport result: 0 sessions counted.
Clickport caught these bots at two levels. First, the tracker itself checks navigator.webdriver on initialization. If it's true, the tracker sends a bot_signal flag with the event payload and then aborts. No further events are sent from that session. Second, even if the tracker's client-side check were bypassed, the server-side bot detection would catch "HeadlessChrome" in the User-Agent via pattern matching.
This is a defense-in-depth approach. The client-side check prevents bot events from ever being sent. The server-side check catches anything that slips through.
If you've ever used Puppeteer, Playwright, or Selenium for testing, you know how easy it is to spin up a headless browser. GA4's tracking script treats these automated browsers exactly the same as a real person using Chrome.
Round 3: Stealth bots from cloud servers
This is where things get interesting. For Round 3, we used puppeteer-extra-plugin-stealth, an open-source plugin that patches 23 known automation fingerprints. It sets navigator.webdriver to false, uses a real Chrome User-Agent string, fixes WebRTC leaks, normalizes plugin arrays, and matches screen dimensions to realistic values.
We ran these 200 sessions from three cloud providers: AWS (us-east-1), Google Cloud (europe-west1), and DigitalOcean (fra1). From a fingerprint perspective, these bots were indistinguishable from real Chrome browsers. The only giveaway was that the traffic came from IP addresses assigned to cloud data centers.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
// navigator.webdriver is now false
// UA string is a real Chrome 121 UA
// All 23 stealth patches applied
await page.goto('https://bot-test-site.vercel.app/');
GA4 result: 200 sessions counted as real visitors.
GA4 does not check the source IP against datacenter IP ranges. As far as GA4 is concerned, a visitor from an AWS EC2 instance in Virginia is no different from a person sitting at their desk in Virginia.
Clickport result: 0 sessions counted.
All 200 were blocked by datacenter IP detection. Clickport maintains an IP range database sourced from ipcat that covers AWS, Google Cloud, Azure, DigitalOcean, Hetzner, OVH, and dozens of other cloud providers. On every incoming event, the client IP is converted to a 32-bit integer and checked against a sorted array of datacenter IP ranges using binary search. If the IP falls within any known datacenter range, the event is blocked.
There's a critical nuance here: VPN users on datacenter IPs are not blocked. Clickport cross-references a VPN IP whitelist before the datacenter check. If the IP matches a known VPN provider, the datacenter check is skipped entirely. This prevents false positives from legitimate visitors using NordVPN, ExpressVPN, or other consumer VPN services that route through datacenter infrastructure.
This round is particularly important because the majority of bot traffic originates from cloud infrastructure. Running a botnet from residential IPs is expensive. Running it from AWS spot instances costs pennies. Datacenter IP blocking catches the vast majority of real-world bot attacks.
Round 4: Referrer spam
Referrer spam is one of the oldest tricks in the bot playbook. Bots visit your site with a fake referrer URL, hoping you'll see the domain in your analytics and click on it. The spam domains are usually SEO link farms, malware distribution sites, or phishing pages.
For Round 4, we configured Puppeteer to set referrer headers from known spam domains. We pulled 20 domains from Matomo's referrer spam list, an open-source, community-maintained blocklist with thousands of entries. Domains like best-seo-offer.com, buttons-for-website.com, free-social-buttons.com, and get-free-traffic-now.com.
Each session loaded a page with the spam domain set as the document.referrer. Some used the stealth plugin, others didn't. The referrer was the constant.
GA4 result: 200 sessions counted as real visitors.
All 200 appeared in GA4's Traffic Acquisition report under "Referral" with the spam domains listed as traffic sources. GA4 has no built-in referrer spam filter. In Universal Analytics, you could at least create view-level hostname filters. GA4 removed that capability entirely. The spam domains sit in your reports permanently.
Clickport result: 0 sessions counted.
All 200 were blocked by the spam referrer detection layer. Clickport checks every incoming event's referrer against Matomo's spam domain list. The matching is thorough: for a referrer URL like https://tracking.best-seo-offer.com/campaign?id=123, the system extracts the hostname, then walks up the domain hierarchy checking tracking.best-seo-offer.com, then best-seo-offer.com. If any level matches the spam list, the event is blocked.
If you've ever opened GA4's Traffic Acquisition report and seen traffic from domains you've never heard of, you've seen this problem firsthand. GA4 has no mechanism to filter these, and no way to retroactively remove them from your data.
Round 5: Stealth bots from residential IPs
This was the hardest test. We wanted to see what happens when a bot does everything right.
We used puppeteer-extra-plugin-stealth with all 23 evasion modules enabled: navigator.webdriver patched to false, real Chrome 121 User-Agent strings, realistic viewport dimensions, WebRTC leak prevention, and proper language headers. Then we routed all traffic through a residential proxy service, so the source IPs were real consumer ISP addresses, not datacenter ranges.
These bots were, for all practical purposes, indistinguishable from a real person using Chrome. The UA was real. The IP was residential. The webdriver flag was patched. The viewport was non-zero. The referrer was clean. There were no spam domains involved.
GA4 result: 200 sessions counted as real visitors.
Same as every other round.
Clickport result: 200 sessions counted as real visitors.
Both tools failed. And we think that's important to say openly.
Here's the honest truth: no client-side analytics tool can reliably detect a well-configured stealth bot running from a residential IP. The detection signals simply aren't there. The User-Agent is legitimate. The IP is residential. The webdriver flag is patched. The viewport is real. The referrer is clean. From the perspective of a JavaScript tracking script receiving an HTTP request, this bot and a real person are identical.
The only defenses against this class of bot are at a different layer entirely: infrastructure-level solutions like Cloudflare Bot Management, server-side behavioral analysis over multiple sessions, or CAPTCHAs. These are outside the scope of what an analytics tool should be doing.
We think any analytics vendor who claims 100% bot detection is either not testing hard enough or not being honest about their results. We'd rather tell you what we can't catch than let you assume we catch everything.
Final scorecard: GA4 0% vs Clickport 80%
Here are the complete results across all five rounds.
GA4's "known bot" filter caught zero of our 1,000 test sessions. Not one. Across five different bot configurations, from the most obvious to the most sophisticated, GA4 treated every automated session as a real human visitor.
Clickport blocked 800 of 1,000 (80%). The 200 that got through were the stealth bots routed through residential proxies, which represent the most expensive and most sophisticated class of bot attack. In real-world traffic, the overwhelming majority of bots use datacenter infrastructure because it's cheap. Residential proxy botnets exist, but they're a fraction of total bot traffic.
Why GA4's bot filtering fails
GA4's bot detection relies on the IAB/ABC International Spiders & Bots List, a curated database of known bot User-Agent strings and IP addresses. Google's support page describes it: "Known bot and spider traffic is identified using a combination of Google research and the International Spiders and Bots List."
The IAB list is designed to catch self-identifying bots. Googlebot, Bingbot, Yandex, Baidu. Bots that announce themselves because they want to be recognized. This was a reasonable approach in 2015. It's not anymore.
The problem is architectural. GA4's tracking works like this:
- A visitor loads your page
- The gtag.js script executes in the browser
- An event payload is sent to Google's collection endpoint
- Google's servers process the event and apply the IAB filter
- If the UA matches the IAB list, the event is excluded
There is no IP-based filtering. No datacenter blocking. No webdriver detection. No referrer spam check. No viewport validation. Just a static list of known bot strings, updated monthly, that only catches bots polite enough to identify themselves.
In Universal Analytics, you at least had a checkbox: "Exclude all hits from known bots and spiders." In GA4, this filtering is applied automatically. There is no toggle. And there are no property-level filters to add your own exclusions. If GA4's filter misses a bot, you have no recourse except building workaround segments in Explorations. Those segments don't apply to standard reports.
The difference is fundamental. GA4 uses a post-collection, signature-based filter. Clickport uses pre-ingestion, multi-layer detection. GA4 records everything and hopes its list catches the bots. Clickport evaluates every event against six detection layers before it reaches the database.
How Clickport's 6-layer bot detection works
Since we're being transparent about our results, here's exactly how each detection layer works. No "proprietary algorithms" or vague "machine learning." Just six concrete checks, run in priority order, with the first match triggering a block.
Layer 1: Webdriver signal. The tracker checks navigator.webdriver in the browser. If it's true (set by Selenium, Playwright, and default Puppeteer), the tracker sends a bot_signal flag and aborts. The server validates this flag as a second check.
Layer 2: Empty User-Agent. Every real browser sends a User-Agent header. If the header is missing or empty, the request is blocked. This catches raw HTTP clients, misconfigured scrapers, and some headless environments.
Layer 3: User-Agent pattern matching. A compiled regex with 55+ patterns covering search engine bots (Googlebot, Bingbot, Baidu), AI crawlers (GPTBot, ClaudeBot, Bytespider, PerplexityBot), SEO tools (Ahrefs, Semrush, Screaming Frog), social media bots (FacebookBot, Twitterbot, LinkedInBot), monitoring services (UptimeRobot, Pingdom, GTmetrix), HTTP libraries (curl, wget, python-requests, axios), headless browsers (HeadlessChrome, PhantomJS), and vulnerability scanners (Nmap, Nikto, WPScan).
Layer 4: Datacenter IP blocking. The client IP is checked against a sorted array of known datacenter IP ranges sourced from ipcat. The lookup uses binary search for O(log n) performance. Before the datacenter check, the IP is tested against a VPN whitelist. VPN matches skip the datacenter check, preventing false positives from legitimate users on VPN services that share datacenter IP space.
Layer 5: Spam referrer blocking. The event's referrer is checked against Matomo's referrer spam list. The system extracts the hostname and walks up the domain hierarchy to catch subdomains. If URL parsing fails, it falls back to substring matching.
Layer 6: Zero viewport. If the event's screen_width is 0, the request is blocked. Real browsers always report a non-zero screen width. Pageleave events are exempt from this check since some browsers legitimately report zero dimensions during page unload.
All three blocklists (datacenter IPs, VPN whitelist, spam referrers) are cached locally and refreshed from their GitHub sources every 7 days. If a refresh fails, the system falls back to the most recent cached version.
Every blocked event is recorded with its detection method and the specific detail that triggered it (the matched UA pattern, the datacenter provider name, the spam domain). This data is surfaced in Clickport's Bot Management panel, so you can see exactly how many bots are hitting your site, which detection layer caught them, and which specific bots or providers are involved. We also break out AI crawler traffic separately: GPTBot, ClaudeBot, PerplexityBot, Bytespider, and others, so you can monitor how often AI services are scraping your content.
What bot traffic does to your business decisions
Bad analytics data doesn't just look wrong. It causes wrong decisions.
Inflated traffic masks real problems. If bots are adding 20-30% phantom visitors to your traffic numbers, you might think your latest blog post performed well when it didn't. You might think your SEO is improving when it's flat. A Google Analytics Community thread with 300+ reports describes exactly this: site owners seeing sudden traffic spikes from Lanzhou, China and Singapore that turned out to be entirely bot traffic, with GA4 counting every session as real.
Wasted ad spend. If bot traffic shows up as "Direct" in your acquisition reports (which it usually does), it dilutes your channel metrics. Your actual cost-per-acquisition from paid campaigns looks better than it is because bots are inflating the denominator. SpiderAF's 2025 report found $37.7 billion in global ad fraud losses in 2024.
Broken A/B tests. Bots don't convert. If 15% of your test traffic is bots, they dilute conversion rates in both variants, making it harder to detect real differences. You need a larger sample size and longer test duration to reach statistical significance, and even then your results are noisy.
False engagement data. Bots that scroll and click (like our Round 2-5 tests) generate engagement events. Your average engagement time goes up or down depending on how the bot behaves. Your scroll depth metrics become unreliable. Your event counts are inflated.
You can't fix GA4 after the fact. This is the most underrated problem. GA4 cannot delete historical data. There is no "remove bot traffic" button. There are no property-level filters. The only workaround is creating segments in Explorations that exclude suspicious regions or low-engagement traffic. But those segments only apply to Explorations, not to your standard reports, not to your dashboards, and not to any downstream tools reading from the GA4 API.
This isn't theoretical. The China/Singapore bot flood in late 2025 affected thousands of GA4 properties. Some sites saw their traffic double overnight. One Portuguese site reported a 15,000% traffic increase in three days. GA4 counted it all, and site owners had no way to remove it from their historical data.
Google's Analytics team acknowledged the issue and said a "long-term spam detection fix" was in development. As of March 2026, it hasn't shipped.
What we learned
Three takeaways from this test.
First, GA4's bot filtering is not a real defense. It catches search engine crawlers that announce themselves with known User-Agent strings. It does nothing against the bots that actually pollute your data: headless browsers, stealth scrapers, referrer spam, and traffic from cloud infrastructure. If 37% of web traffic is bad bots and GA4 only catches the ones on the IAB list, the gap between your reported numbers and reality could be enormous.
Second, pre-ingestion blocking matters. Clickport evaluates every event against six detection layers before it enters the database. If a bot is detected, the event is blocked and logged in a separate stats table. Your analytics data only contains verified, non-bot traffic. There's no "maybe we'll filter it later" step. Compare this to GA4, where bot traffic enters the database first and the filter is applied afterward. If the filter misses something, your data is permanently contaminated.
Third, no analytics tool catches everything. Our Round 5 results prove it. A sufficiently sophisticated bot with residential IPs, patched fingerprints, and realistic behavior will fool any client-side analytics tool. The right response isn't to pretend otherwise. It's to catch the 80% you can catch, be transparent about the 20% you can't, and give users the tools to manually flag suspicious sessions when they spot something that automated detection missed.
If you're tired of looking at GA4 numbers and wondering how many of those visitors are real, try Clickport free for 30 days. No credit card required. Install the tracker, keep GA4 running alongside it, and compare the numbers yourself. The difference in your data will be visible within the first day.

Comments
Loading comments...
Leave a comment