We Sent 1,000 Fake Visitors to a Site Running GA4 and Clickport. Here's What Each Tool Counted.
Show article contentsHide article contents
- 51% of web traffic is now bots
- Our test: 1,000 bot sessions, 5 scenarios
- Round 1: Known bot User-Agent strings
- Round 2: Default headless Chrome
- Round 3: Stealth bots from cloud servers
- Round 4: Referrer spam
- Round 5: Stealth bots from residential IPs
- Final scorecard: GA4 0% vs Clickport 80%
- Why GA4's bot filtering fails
- How Clickport's 6-layer bot detection works
- What bot traffic does to your business decisions
- What we learned
Your analytics dashboard says you had 10,000 visitors last month. But how many were real? We ran a controlled experiment to find out. The results didn't look good for Google Analytics.
- GA4's built-in bot filter uses the IAB known-bot list, which only catches bots that self-identify. In our test of 1,000 bot sessions across 5 scenarios, GA4 filtered zero.
- 51% of all web traffic is now automated, with bad bots alone accounting for 37% (Imperva 2025). AI-driven bot creation has pushed simple bot traffic up from 40% to 45% of all bad bot activity.
- Clickport's 6-layer detection system (webdriver signals, UA patterns, datacenter IP blocking, spam referrer lists, viewport checks) caught 80% of our test bots before they reached the database.
- Round 5's stealth bots on residential proxies fooled both tools. No client-side analytics tool catches these. 200 of 1,000 got through.
- GA4's data deletion only strips parameter text. Event counts stay permanently. Zero property-level bot filters means inflated numbers are irreversible.
51% of web traffic is now bots
Before the test results, some context on how big this problem actually is.
The Imperva 2025 Bad Bot Report is the most comprehensive annual study on bot traffic, and the 2025 edition paints a stark picture. Bad bots alone account for 37% of all internet traffic, up from 32% in 2023. Good bots (search engine crawlers, monitoring tools) account for the remaining ~14%. Human traffic? Just 49%.
Akamai's research puts total bot traffic at 42%, with 65% of those classified as malicious. Cloudflare's 2025 Year in Review found that non-AI bots alone generated 50% of requests to HTML pages, 7% more than human traffic.
AI is accelerating this. DoubleVerify reported an 86% year-over-year increase in general invalid traffic in the second half of 2024, with AI scrapers like GPTBot, ClaudeBot, and AppleBot accounting for 16% of GIVT from known-bot impressions. Akamai saw a 300% surge in AI bot activity year-over-year.
Cloudflare CEO Matthew Prince put it bluntly at SXSW on March 14, 2026: bot traffic will exceed human traffic by 2027.
The bots themselves are getting better. Barracuda's threat research classified 49% of detected bots as "advanced," designed to mimic human browsing behavior. Imperva found that simple bots grew from just under 40% to 45% of all bad bot traffic, driven by AI tools that make bot creation trivially easy.
If your analytics tool can't tell the difference between a bot and a real person, every number in your dashboard is suspect. Your traffic is inflated, your engagement rates are diluted, your A/B tests are contaminated, and your marketing spend decisions are based on fiction.
Our test: 1,000 bot sessions, 5 scenarios
Here's what we set up. A fresh Astro site deployed on Vercel with five pages: a homepage, three content pages, and a contact page. Both GA4 (via gtag.js) and Clickport's tracking script were installed. We disabled Vercel's built-in bot protection so both analytics tools would receive the raw traffic.
The bot traffic was generated using Puppeteer on Node.js 20, with each scenario configured to simulate a different level of bot sophistication. Every bot session included realistic behavior: 50% of sessions scrolled the page, 30% clicked internal links and navigated to a second page, and all sessions included random delays between 2 and 8 seconds between actions.
We ran the tests over five days, March 3 through March 7, 2026. We waited 72 hours after the final test before checking GA4's processed reports (GA4 documents up to 48 hours of processing time).
Five rounds, 200 bot sessions each, 1,000 total. Each round escalated in sophistication.
Round 1: Known bot User-Agent strings
The easiest test first. We configured Puppeteer to rotate through five non-browser User-Agent strings: PostmanRuntime/7.43.4, python-requests/2.31.0, curl/8.4.0, Go-http-client/2.0, and Wget/1.21.4. These are common HTTP library identifiers that no real browser would ever send.
All 200 sessions ran from a residential IP address. The bots loaded pages, scrolled, and clicked links. The only thing unusual about them was the UA string.
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
const botAgents = [
'PostmanRuntime/7.43.4',
'python-requests/2.31.0',
'curl/8.4.0',
'Go-http-client/2.0',
'Wget/1.21.4'
];
await page.setUserAgent(botAgents[Math.floor(Math.random() * botAgents.length)]);
await page.goto('https://bot-test-site.vercel.app/');
GA4 result: 200 sessions counted as real visitors.
Every single one appeared in GA4's Real-time report during the test and persisted in the processed Traffic Acquisition report three days later. GA4's "known bot" filter did nothing.
Clickport result: 0 sessions counted.
All 200 were blocked at ingestion by User-Agent pattern matching. Clickport's bot detection runs a compiled regex with 80+ bot signatures against every incoming request. "PostmanRuntime," "python-requests," "curl," "Go-http-client," and "Wget" are all on the list.
This is the simplest possible bot to detect. A non-browser User-Agent string is the equivalent of a burglar wearing a name tag that says "burglar." GA4 missed all of them.
Round 2: Default headless Chrome
For Round 2, we left Puppeteer in its default configuration. No stealth plugins, no UA spoofing. Default headless Chrome has two telltale signs: the User-Agent string contains "HeadlessChrome," and navigator.webdriver is set to true.
We ran 200 sessions from a residential IP. Same behavioral simulation: page loads, scrolling, clicking, random delays.
GA4 result: 200 sessions counted as real visitors.
GA4 does not check for navigator.webdriver. It does not look for "HeadlessChrome" in the User-Agent. Its gtag.js script fires on page load, sends the event to Google's collection endpoint, and that's it.
Clickport result: 0 sessions counted.
Clickport caught these bots at two levels. First, the tracker itself checks navigator.webdriver on initialization. If it's true, the tracker silently exits without sending any data to the server. No events, no requests, nothing. The session never existed as far as the database is concerned. Second, even if the tracker's client-side check were bypassed, the server-side bot detection would catch "HeadlessChrome" in the User-Agent via pattern matching.
This is a defense-in-depth approach. The client-side check prevents bot events from ever being sent. The server-side check catches anything that slips through.
If you've ever used Puppeteer, Playwright, or Selenium for testing, you know how easy it is to spin up a headless browser. GA4's tracking script treats these automated browsers exactly the same as a real person using Chrome.
Update (March 2026): Chrome's newer --headless=new mode, which is now the default in Puppeteer and Playwright, no longer exposes "HeadlessChrome" in the User-Agent string. It runs the full browser engine and is architecturally identical to headed Chrome. This means even our Round 2 test was generous to GA4. Today's default headless browsers are harder to detect than the ones we used, and GA4 couldn't catch them even when the signals were obvious.
Round 3: Stealth bots from cloud servers
This is where things get interesting. For Round 3, we used puppeteer-extra-plugin-stealth, an open-source plugin that patches 16 known automation fingerprints. It sets navigator.webdriver to false, uses a real Chrome User-Agent string, normalizes plugin arrays, and fixes missing window dimensions in headless mode.
We ran these 200 sessions from three cloud providers: AWS (us-east-1), Google Cloud (europe-west1), and DigitalOcean (fra1). From a fingerprint perspective, these bots were indistinguishable from real Chrome browsers. The only giveaway was that the traffic came from IP addresses assigned to cloud data centers.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
// navigator.webdriver is now false
// UA string is a real Chrome 121 UA
// All 23 stealth patches applied
await page.goto('https://bot-test-site.vercel.app/');
GA4 result: 200 sessions counted as real visitors.
GA4 does not check the source IP against datacenter IP ranges. As far as GA4 is concerned, a visitor from an AWS EC2 instance in Virginia is no different from a person sitting at their desk in Virginia.
Clickport result: 0 sessions counted.
All 200 were blocked by datacenter IP detection. Clickport maintains an IP range database sourced from ipcat that covers AWS, Google Cloud, Azure, DigitalOcean, Hetzner, OVH, and dozens of other cloud providers. On every incoming event, the client IP is converted to a 32-bit integer and checked against a sorted array of datacenter IP ranges using binary search. If the IP falls within any known datacenter range, the event is blocked.
There's a critical nuance here: VPN users on datacenter IPs are not blocked. Clickport cross-references a VPN IP whitelist before the datacenter check. If the IP matches a known VPN provider, the datacenter check is skipped entirely. This prevents false positives from legitimate visitors using NordVPN, ExpressVPN, or other consumer VPN services that route through datacenter infrastructure.
This round is particularly important because the majority of bot traffic originates from cloud infrastructure. Running a botnet from residential IPs is expensive. Running it from AWS spot instances costs pennies. Datacenter IP blocking catches the vast majority of real-world bot attacks.
Round 4: Referrer spam
Referrer spam is one of the oldest tricks in the bot playbook. Bots visit your site with a fake referrer URL, hoping you'll see the domain in your analytics and click on it. The spam domains are usually SEO link farms, malware distribution sites, or phishing pages.
For Round 4, we configured Puppeteer to set referrer headers from known spam domains. We pulled 20 domains from Matomo's referrer spam list, an open-source, community-maintained blocklist with thousands of entries. Domains like best-seo-offer.com, buttons-for-website.com, free-social-buttons.com, and get-free-traffic-now.com.
Each session loaded a page with the spam domain set as the document.referrer. Some used the stealth plugin, others didn't. The referrer was the constant.
GA4 result: 200 sessions counted as real visitors.
All 200 appeared in GA4's Traffic Acquisition report under "Referral" with the spam domains listed as traffic sources. GA4 does have an "unwanted referrals" feature, but it only reclassifies referral traffic to its previous source. It does not remove spam visits from your data. The spam domains' sessions remain in your reports permanently.
Clickport result: 0 sessions counted.
All 200 were blocked by the spam referrer detection layer. Clickport checks every incoming event's referrer against Matomo's spam domain list. The matching is thorough: for a referrer URL like https://tracking.best-seo-offer.com/campaign?id=123, the system extracts the hostname, then walks up the domain hierarchy checking tracking.best-seo-offer.com, then best-seo-offer.com. If any level matches the spam list, the event is blocked.
If you've ever opened GA4's Traffic Acquisition report and seen traffic from domains you've never heard of, you've seen this problem firsthand. GA4 has no mechanism to filter these, and no way to retroactively remove them from your data.
Round 5: Stealth bots from residential IPs
This was the hardest test. We wanted to see what happens when a bot does everything right.
We used puppeteer-extra-plugin-stealth with all 16 evasion modules enabled: navigator.webdriver patched to false, real Chrome 121 User-Agent strings, realistic viewport dimensions, and proper language headers. Then we routed all traffic through a residential proxy service, so the source IPs were real consumer ISP addresses, not datacenter ranges.
These bots were, for all practical purposes, indistinguishable from a real person using Chrome. The UA was real. The IP was residential. The webdriver flag was patched. The viewport was non-zero. The referrer was clean. There were no spam domains involved.
GA4 result: 200 sessions counted as real visitors.
Same as every other round.
Clickport result: 200 sessions counted as real visitors.
Both tools failed. And we think that's important to say openly.
Here's the honest truth: no client-side analytics tool can reliably detect a well-configured stealth bot running from a residential IP. The detection signals simply aren't there. The User-Agent is legitimate. The IP is residential. The webdriver flag is patched. The viewport is real. The referrer is clean. From the perspective of a JavaScript tracking script receiving an HTTP request, this bot and a real person are identical.
The only defenses against this class of bot are at a different layer entirely: infrastructure-level solutions like Cloudflare Bot Management, server-side behavioral analysis over multiple sessions, or CAPTCHAs. These are outside the scope of what an analytics tool should be doing.
We think any analytics vendor who claims 100% bot detection is either not testing hard enough or not being honest about their results. We'd rather tell you what we can't catch than let you assume we catch everything.
Final scorecard: GA4 0% vs Clickport 80%
Here are the complete results across all five rounds.
GA4's "known bot" filter caught zero of our 1,000 test sessions. Not one. Across five different bot configurations, from the most obvious to the most sophisticated, GA4 treated every automated session as a real human visitor.
Clickport blocked 800 of 1,000 (80%). The 200 that got through were the stealth bots routed through residential proxies, which represent the most expensive and most sophisticated class of bot attack. In real-world traffic, the overwhelming majority of bots use datacenter infrastructure because it's cheap. Residential proxy botnets exist, but they're a fraction of total bot traffic.
Why GA4's bot filtering fails
GA4's bot detection relies on the IAB/ABC International Spiders & Bots List, a curated database of known bot User-Agent strings and IP addresses. Google's support page describes it: "Known bot and spider traffic is identified using a combination of Google research and the International Spiders and Bots List."
The IAB list is designed to catch self-identifying bots. Googlebot, Bingbot, Yandex, Baidu. Bots that announce themselves because they want to be recognized. This was a reasonable approach in 2015. It's not anymore.
The problem is architectural. GA4's tracking works like this:
- A visitor loads your page
- The gtag.js script executes in the browser
- An event payload is sent to Google's collection endpoint
- Google's servers process the event and apply the IAB filter
- If the UA matches the IAB list, the event is excluded
There is no IP-based filtering. No datacenter blocking. No webdriver detection. No referrer spam check. No viewport validation. Just a static list of known bot strings, updated monthly, that only catches bots polite enough to identify themselves.
In GA4, IAB-based bot filtering is always on. You cannot disable it, customize it, or see how much traffic it excluded. There are no property-level filters to exclude traffic by User-Agent, referrer domain, or other bot-specific signals. GA4 does offer IP-based internal traffic filters, but those are designed for excluding your own team's visits, not for blocking bots. If GA4's IAB filter misses a bot, you have no recourse except building workaround segments in Explorations. Those segments don't apply to standard reports.
The difference is fundamental. GA4 uses a post-collection, signature-based filter. Clickport uses pre-ingestion, multi-layer detection. GA4 records everything and hopes its list catches the bots. Clickport evaluates every event against six detection layers before it reaches the database.
How Clickport's 6-layer bot detection works
Since we're being transparent about our results, here's exactly how each detection layer works. No "proprietary algorithms" or vague "machine learning." Just six concrete checks, run in priority order, with the first match triggering a block.
Layer 1: Webdriver signal. The tracker checks navigator.webdriver in the browser. If it's true (set by Selenium, Playwright, and default Puppeteer), the tracker silently exits without sending any data. No events reach the server. As a backup, the server also checks for webdriver signals in event payloads in case a different client implementation sends them.
Layer 2: Empty User-Agent. Every real browser sends a User-Agent header. If the header is missing or empty, the request is blocked. This catches raw HTTP clients, misconfigured scrapers, and some headless environments.
Layer 3: User-Agent pattern matching. A compiled regex with 80+ patterns covering search engine bots (Googlebot, Bingbot, Baidu), AI crawlers (GPTBot, ClaudeBot, Bytespider, PerplexityBot), SEO tools (Ahrefs, Semrush, Screaming Frog), social media bots (FacebookBot, Twitterbot, LinkedInBot), monitoring services (UptimeRobot, Pingdom, GTmetrix), HTTP libraries (curl, wget, python-requests, axios), headless browsers (HeadlessChrome, PhantomJS), and vulnerability scanners (Nmap, Nikto, WPScan).
Layer 4: Datacenter IP blocking. The client IP is checked against a sorted array of known datacenter IP ranges sourced from ipcat. The lookup uses binary search for O(log n) performance. Before the datacenter check, the IP is tested against a VPN whitelist. VPN matches skip the datacenter check, preventing false positives from legitimate users on VPN services that share datacenter IP space.
Layer 5: Spam referrer blocking. The event's referrer is checked against Matomo's referrer spam list. The system extracts the hostname and walks up the domain hierarchy to catch subdomains. If URL parsing fails, it falls back to substring matching.
Layer 6: Zero viewport. If the event's screen_width is 0, the request is blocked. Real browsers always report a non-zero screen width. Pageleave events are exempt from this check since some browsers legitimately report zero dimensions during page unload.
All three blocklists (datacenter IPs, VPN whitelist, spam referrers) are cached locally and refreshed from their GitHub sources every 7 days. If a refresh fails, the system falls back to the most recent cached version.
Every blocked event is recorded with its detection method and the specific detail that triggered it (the matched UA pattern, the datacenter provider name, the spam domain). This data is surfaced in Clickport's Bot Management panel, so you can see exactly how many bots are hitting your site, which detection layer caught them, and which specific bots or providers are involved. We also break out AI crawler traffic separately: GPTBot, ClaudeBot, PerplexityBot, Bytespider, and others, so you can monitor how often AI services are scraping your content.
What bot traffic does to your business decisions
Bad analytics data doesn't just look wrong. It causes wrong decisions.
Inflated traffic masks real problems. If bots are adding 20-30% phantom visitors to your traffic numbers, you might think your latest blog post performed well when it didn't. You might think your SEO is improving when it's flat. A Google Analytics Community thread with 300+ reports describes exactly this: site owners seeing sudden traffic spikes from Lanzhou, China and Singapore that turned out to be entirely bot traffic, with GA4 counting every session as real.
Wasted ad spend. If bot traffic shows up as "Direct" in your acquisition reports (which it usually does), it dilutes your channel metrics. Your actual cost-per-acquisition from paid campaigns looks better than it is because bots are inflating the denominator. SpiderAF's 2025 report found $37.7 billion in global ad fraud losses in 2024.
Broken A/B tests. Bots don't convert. If 15% of your test traffic is bots, they dilute conversion rates in both variants, making it harder to detect real differences. You need a larger sample size and longer test duration to reach statistical significance, and even then your results are noisy.
False engagement data. Bots that scroll and click (like our Round 2-5 tests) generate engagement events. Your average engagement time goes up or down depending on how the bot behaves. Your scroll depth metrics become unreliable. Your event counts are inflated.
You can't fix GA4 after the fact. This is the most underrated problem. GA4 does have a data deletion feature, but it only strips parameter text from events. The events themselves, and their counts, remain in your reports permanently. There is no "remove these bot sessions" button. There are no property-level bot filters. The only workaround is creating segments in Explorations that exclude suspicious regions or low-engagement traffic. But those segments only apply to Explorations, not to your standard reports, not to your dashboards, and not to any downstream tools reading from the GA4 API.
This isn't theoretical. The China/Singapore bot flood in late 2025 affected thousands of GA4 properties. Some sites saw their traffic double overnight. One Portuguese site reported a 15,000% traffic increase in three days. GA4 counted it all, and site owners had no way to remove it from their historical data.
Google's Analytics team acknowledged the issue and said a "long-term spam detection fix" was in development. As of March 2026, it hasn't shipped.
What we learned
Three takeaways from this test.
First, GA4's bot filtering is not a real defense. It catches search engine crawlers that announce themselves with known User-Agent strings. It does nothing against the bots that actually pollute your data: headless browsers, stealth scrapers, referrer spam, and traffic from cloud infrastructure. If 37% of web traffic is bad bots and GA4 only catches the ones on the IAB list, the gap between your reported numbers and reality could be enormous.
Second, pre-ingestion blocking matters. Clickport evaluates every event against six detection layers before it enters the database. If a bot is detected, the event is blocked and logged in a separate stats table. Your analytics data only contains verified, non-bot traffic. There's no "maybe we'll filter it later" step. Compare this to GA4, where bot traffic enters the database first and the filter is applied afterward. If the filter misses something, your data is permanently contaminated.
Third, no analytics tool catches everything. Our Round 5 results prove it. A sufficiently sophisticated bot with residential IPs, patched fingerprints, and realistic behavior will fool any client-side analytics tool. The right response isn't to pretend otherwise. It's to catch the 80% you can catch, be transparent about the 20% you can't, and give users the tools to manually flag suspicious sessions when they spot something that automated detection missed.
If you're tired of looking at GA4 numbers and wondering how many of those visitors are real, try Clickport free for 30 days. Install the tracker, keep GA4 running alongside it, and compare the numbers yourself. The difference in your data will be visible within the first day.

Comments
Loading comments...
Leave a comment