Last updated: April 23, 2026 · 25 min read

Is My Website Traffic Real or Bots? (2026 Data)

Q: Why doesn't GA4 have a native AI bot channel in 2026?

Google has not said, specifically. The public record: Google classifies AI Overviews and AI Mode traffic as Organic Search, consistent with both being served through google.com. An AI-assistants regex example was added to the Custom Channel Groups documentation in July 2025, but it's user-configured, not a default channel. Building AI into defaults would require either a Google decision about which platforms count or a match rule on something beyond referrer domain.

Q: Can I see which bots GA4 excluded?

No. Google's documentation states: "you cannot see how much known bot traffic was excluded." The data is dropped before processing. There's no bot report, no BigQuery export, and no audit of what was filtered. You get the post-filter number and nothing else.

Q: What percentage of my website traffic is normal to have as bots?

Depending on site type and how much you're being scraped, anywhere from 2% to 50%+. Our internal data shows a median of 20% across sites with more than 100 human events per month. Industry network-level benchmarks from Imperva and Akamai land in the 37% to 51% range. If you're under 10%, you're on the clean end. If you're over 40%, something is actively scraping you.

Q: How do I stop bots without blocking Google's crawlers?

Google publishes a verified list of its own crawlers along with IP ranges for verification via reverse DNS. Any bot filtering strategy should whitelist Googlebot, AdsBot-Google, and Googlebot-Image. Same for Bingbot (Microsoft publishes IP ranges). The trick is to block bad bots without hitting good bots, which is why IP reputation plus UA matching beats either signal alone.

Q: Does a CAPTCHA stop bot traffic in GA4?

Not effectively in 2026. A 2023 USENIX paper from UC Irvine and Microsoft found that bots solve distorted-text CAPTCHAs in under a second with close to 100% accuracy. Humans take up to 15 seconds and finish correctly only 50-84% of the time. For image-based CAPTCHAs (reCAPTCHA grids, hCaptcha), bot and human solve times are roughly comparable. CAPTCHAs are a speed bump, not a shield. They reduce low-effort bot traffic. Advanced bots with ML-based solvers bypass them routinely.

Q: Why do I have traffic from countries I don't sell to?

Three usual suspects. One, scrapers running from cloud infrastructure in cheap regions (Singapore, Brazil, Vietnam host a lot of low-cost compute). Two, referrer spam and scripted traffic targeting all domains indiscriminately. Three, residential proxies that rotate through consumer IPs across multiple countries. Filter your Explorations by country and cross-reference with your mobile/desktop split. Suspicious countries usually show up with desktop skew and very low engagement.

Q: Does server-side tracking solve the bot problem?

No. Server-side tracking solves other problems (consent blocks, ad-blocker interference) but the bot request still reaches your server. If the bot's User-Agent matches a real browser and its IP is a residential proxy, your server-side event pipeline has the same trouble distinguishing it from a human as a client-side tag does. Bot detection has to run during request processing, not in the shipping layer.

Show article contentsHide article contents

How much of my traffic is bots?
Why GA4 says your traffic is clean when it isn't
The three bot categories GA4 can't see
Signs your traffic isn't real, that you can check right now
The 2024-2026 bot case files
AI crawlers, the category that didn't exist three years ago
Ad platform bot traffic, the IVT blind spot
How to diagnose bot contamination without adding any new tool
What actually works: multi-layer detection
What Clickport catches that GA4 doesn't
Frequently asked questions
One line to take away

In April 2026, we audited bot activity across our customer sites for 30 days. The median site saw 20% of incoming traffic flagged as bots. The range ran from 2% to 82%. And 57% of those bots would have passed GA4's default "Exclude known bots" filter because the detection required signals GA4 doesn't check. Put another way: if you run GA4, more than half the bots on your site are invisible to it.

Key Takeaways

Across Clickport customer sites over 30 days, the median site saw 20% of incoming traffic flagged as bots. The range ran from 2% to 82%. Bot load varies wildly by site type, content, and whether you're being actively targeted.
GA4's 'Exclude known bots and spiders' filter is always on and uses the IAB/ABC International Spiders and Bots List. Access to the list costs between $5,000 and $15,000 per year. It matches user-agent strings. It doesn't check browser fingerprints, datacenter IP reputation, or behavioral signals.
When we measured our own bot detections, 57% of bots we caught relied on non-UA signals: browser GPU fingerprinting, datacenter IP reputation, and behavioral velocity. Those bots pass through GA4's filter undetected.
AI training crawlers including GPTBot, ClaudeBot, Applebot-Extended, and Meta-ExternalAgent are not on the IAB bot list. GA4 counts them as real visitors. GPTBot's requests to sites grew 305% year-over-year per Cloudflare Radar.
Ad platforms refund billing on invalid traffic they detect but don't notify your analytics tool. A bot that clicked your Google Ad still lands in GA4 as a real session. Lunio's 2026 data shows IVT rates by platform: TikTok 24.2%, LinkedIn 19.9%, Google 7.6%.

How much of my traffic is bots?

The honest answer: it depends on your site, but it's almost certainly more than your analytics says. Across Clickport customer sites in the 30-day window, the median site had 20% bot share. One clean B2B SaaS site had 2.5%. One site under an active scraping campaign had 82%. That range is the point.

Industry benchmarks tell the same story at scale. Imperva's 2025 Bad Bot Report, covering 13 trillion blocked requests across 2024 traffic, reported 51% of all web traffic as automated. 37% was classified as bad bots. That was the first time in a decade automation exceeded human traffic on Imperva's network. Akamai's June 2024 State of the Internet report put bots at 42% of web traffic, two-thirds of that malicious. Fastly's Q1 2025 Threat Insights had 37% bots, with 89% of that unwanted.

Meaning: at the infrastructure layer, roughly 40% to 50% of requests aren't human. At the site level, the range is wider. A quiet SaaS portal and a scraped ecommerce catalog sit on opposite ends of the same distribution. The bot traffic problem isn't a single number you can quote. It's a range, and your number depends on what you publish, who's scraping it, and what your analytics actually sees.

BOT SHARE ACROSS CLICKPORT CUSTOMER SITES, 30 DAYS

20%

median site

2.5%

cleanest site

82%

most-targeted site

Clickport internal ClickHouse study, April 2026. Sites with >100 human events in the window. Site-level percentages, not request-weighted aggregate.

That spread is worth sitting with. If your bot share is 2%, your GA4 numbers are close to accurate. If it's 20%, every trend line on your dashboard is off by a fifth. If it's 80%, you're not measuring your audience, you're measuring a scraper's schedule.

Why GA4 says your traffic is clean when it isn't

GA4's "Exclude known bots and spiders" filter is always on. You can't turn it off. That sounds reassuring until you read how it works. Per Google's documentation, GA4 filters using "a combination of Google research and the International Spiders and Bots List, maintained by the Interactive Advertising Bureau."

Two things about that list most people don't know.

First, it's not free. The IAB/ABC International Spiders and Bots List costs $5,000 per year for IAB members, $7,500 for associate members, and $15,000 per year for non-members. It's distributed by FTP after subscription. You can't browse what's on it. You can't see what GA4 caught. You just get told, by Google, that it's working.

Second, the list matches on user-agent strings. A bot that sends Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0 Safari/537.36 passes through. Not because it's human. Because the list is looking for strings like Googlebot, Applebot, YandexBot. If a scraper declares itself as Chrome, the list has nothing to match.

When we looked at 30 days of our own detections, 43% of bots we caught were caught by UA patterns. The other 57% needed signals GA4's filter can't see: browser GPU fingerprinting, datacenter IP reputation, behavioral velocity. Plausible Analytics ran a controlled test in 2024 and found GA4 passed bot traffic in all three test scenarios. The filter is on. It's just looking in the wrong place.

WHAT GA4'S DEFAULT BOT FILTER CAN AND CAN'T SEE

CATCHES (UA MATCH)

Googlebot, Bingbot, Applebot (declared), AhrefsBot, Semrushbot, and other bots in the IAB list. About 43% of our total detections.

MISSES (NON-UA SIGNALS)

Headless Chrome declaring as Chrome, bots on residential proxies, behavioral bot farms, most AI crawlers. About 57% of our total detections.

Sources: Google Analytics help, IAB/ABC Spiders and Bots List, Clickport internal study (April 2026).

GA4 also doesn't tell you what it filtered. Google's own documentation confirms: "you cannot see how much known bot traffic was excluded." The data is dropped before processing. No bot report, no BigQuery export, no audit trail. You're trusting a filter you can't inspect against a list you can't read.

We ran the same test against our own tracker earlier and got the same conclusion from a different angle: anything above a UA-match layer has to come from somewhere else.

The three bot categories GA4 can't see

The bots that bypass GA4's filter fall into three groups. Each one breaks a different assumption in the IAB-list approach.

Headless browsers running real JavaScript. Puppeteer, Playwright, Selenium, and stealth-patched Chromium all run a real rendering engine. Your GA4 tag fires, the session gets recorded, and the UA string says "Chrome." What gives them away is what the rendering engine reports. Headless Chrome without hardware acceleration exposes renderer strings like "Software GPU Renderer" and ANGLE SwiftShader through WebGL.getParameter(RENDERER). Real Chrome reports actual GPU hardware. It's a one-line JavaScript check, and GA4 doesn't run it. In our data, the GPU renderer fingerprint (Software GPU Renderer, ANGLE SwiftShader, and similar software-rendering strings) flagged 1.54 million bot sessions over 30 days. 36% of our total bot detections came from this single signal.

Residential proxy bots. Services like Bright Data, IPRoyal, and Smartproxy sell access to pools of real consumer IPs, often without the host's consent. A bot running through a residential proxy has an IP address identical to a real human's home connection. IP blocklists and datacenter-range filters can't distinguish them. Imperva's 2025 report found 21% of bot attacks using ISPs ran through residential proxies. This is how the Perplexity vs Amazon story unfolded in late 2025: Amazon blocked Perplexity's declared IPs, and Perplexity's Comet agent switched ASNs and spoofed a Chrome-on-macOS user-agent within 24 hours.

AI training crawlers. GPTBot, ClaudeBot, Applebot-Extended, Meta-ExternalAgent, PerplexityBot. None of them were on the IAB list when it was last synced. GA4 counts every one as a real visitor. We'll cover the scale of this category in its own section, but the mechanism is the same as the other two: the list hasn't caught up.

THE THREE BOT CATEGORIES GA4'S FILTER MISSES

HEADLESS BROWSERS

Puppeteer, Playwright, stealth Chromium. Runs real JS. UA says "Chrome."

Gives itself away via GPU renderer string (SwiftShader, llvmpipe). 36% of our detections.

RESIDENTIAL PROXY BOTS

Real consumer IPs. Defeats datacenter blocklists. Often millions of IPs rotated.

Detection requires behavioral signals: velocity, session cadence, interaction patterns.

AI TRAINING CRAWLERS

GPTBot, ClaudeBot, Applebot-Extended. Declared UAs. Not on the IAB list.

GPTBot +305% YoY per Cloudflare. GA4 counts all of them as real visitors.

Clickport internal study (April 2026) and Cloudflare Radar 2025 Year in Review.

Our bot-detection framework covers the technical stack behind each of these layers. The key point for this article is simpler: a filter looking at user-agent strings will miss bots that don't lie about their user-agent (because they're declared scrapers), and bots that lie well enough to pass for humans.

Signs your traffic isn't real, that you can check right now

You don't need a new tool to check. GA4 won't show you bots directly, but it will show you the patterns bots leave behind if you know where to look. Here's a diagnostic checklist you can run in the next 15 minutes.

Your Direct channel is growing with no brand activity. Direct traffic should track brand search volume roughly. If your Direct bucket climbed 30% in the last quarter and your branded queries in Search Console are flat, something else is arriving as "no referrer." Some of it is dark social (Slack, WhatsApp, iMessage). Some of it is bots that stripped the referrer. We wrote about this specific pattern in the context of sudden direct-traffic spikes.

Bounce rate is near 100% on a specific source. Humans don't bounce 100% of the time on a legitimate source. If Facebook Ads is sending you traffic that bounces at 98%, it's not that your landing page is broken. It's that your traffic isn't landing a human.

Session duration is under 1 second for a cluster of sessions. Real humans take longer than 1 second to decide they don't like a page. A bot can load, execute, and leave in under 500 milliseconds. Filter your Explorations to sessions under 1 second and see what source they came from.

Traffic from countries you don't serve. A B2B SaaS selling to North America that has 40% of its traffic from Vietnam, Singapore, and Brazil isn't serving those markets accidentally. That's either bot traffic or data center traffic testing.

Same device, same screen size, same browser version, all at once. Humans have messy fingerprints. Bot farms don't. If 8,000 sessions in one hour all report 1920x1080, Chrome 142, Windows 10, in the same country, that's one bot farm with 8,000 concurrent workers.

Impossible engagement ratios. Scroll depth and time on page should correlate. A session with 1 second duration and 95% scroll depth is lying about one of those two numbers. GA4 Explorations let you build this filter manually.

SIX BOT PATTERNS VISIBLE IN GA4 WITHOUT A NEW TOOL

Direct channel growing with no brand-search growth

Bounce rate near 100% on a specific campaign or source

Session duration under 1 second for a cluster of visits

Traffic from countries you don't serve or market in

Identical device, screen, browser combos at scale

Engagement ratios that don't physically make sense

None of these is definitive on its own. Two or three together is a strong signal.

None of these is definitive on its own. Two or three of them showing up together on the same slice of traffic is a strong signal something isn't human.

The 2024-2026 bot case files

Abstract "bot traffic exists" doesn't land until you see specific cases. Here are four documented campaigns from the last two years, ranging from accidental AI-crawler misbehavior to a federal lawsuit.

Read the Docs, May 2024. An AI training crawler with a bug repeatedly downloaded the same zipped HTML snapshots instead of caching them. 30 MB per second, continuously, for a month. Total: 73 TB, nearly 10 TB in a single day. The bill to Read the Docs: over $5,000 in CDN bandwidth. The cause was an old redirect to a dynamic URL that couldn't be cached. The crawler never noticed.

Nerd Crawler, May 2024. A niche comic-art marketplace discovered that ByteDance's Bytespider was generating 300,000 image requests per day, ignoring robots.txt, and rotating IPs through China, Singapore, and AWS. Bandwidth cost was the site's largest operating expense. Blocking Bytespider cut bandwidth by roughly 60% and, per the site owner's own estimate, shaved 20-30% off total operating cost.

Perplexity vs Amazon, August 2025 to March 2026. Amazon blocked Perplexity's declared PerplexityBot IP ranges on August 19, 2025. Within 24 hours, Perplexity's Comet agent switched ASNs and spoofed a Chrome-on-macOS user-agent to evade the block. Amazon ran a forensic analysis across browser fingerprint patterns to isolate the bot. The case ended in a federal court order in March 2026.

DataDome review platform, March 2026. A coordinated scraping operation used 855,000 unique IP addresses across 13 days to scrape a business review platform. Peak: 1.35 million blocked requests every two hours. Total: 80 million blocked requests. Goal: bulk harvesting of proprietary business listing data.

The through-line: modern bot campaigns don't look like the "bot" most people picture. They run real browser stacks, rotate real-looking IPs, and spoof UA strings when blocked. A UA-list filter catches none of this.

AI crawlers, the category that didn't exist three years ago

Before 2022, "AI crawler" wasn't a category in the bot world. In 2026, it's 4.2% of global HTML requests per Cloudflare Radar, and none of the major AI crawlers are on the IAB list GA4 uses. Your analytics counts every one as a human.

Cloudflare's tracking between May 2024 and May 2025 shows the scale:

GPTBot (OpenAI): 305% year-over-year growth in raw requests. Share of AI crawler traffic: 5% to 30%. Now the most-blocked AI crawler in robots.txt on 312 of 3,816 sampled domains.
ClaudeBot (Anthropic): 21% of AI crawler traffic. Crawls at a heavy multiple of the referrals it sends back, per Cloudflare's AI crawler analysis. Anthropic does not publish IP ranges for verification.
PerplexityBot: 0.2% of crawler traffic but raw requests grew 157,490% in 12 months. Cloudflare has documented Perplexity ignoring robots.txt directives.
Applebot-Extended: Launched June 2024 for Apple Intelligence training. Respects robots.txt opt-out via a separate user-agent. Most publishers don't know the distinction.
Meta-ExternalAgent: Near-zero in May 2024, 19% share by May 2025. Fastly reports Meta accounts for 52% of AI crawler traffic to high-authority domains.
Bytespider (ByteDance): Collapsed from 42% to 7% share of AI crawler traffic over 12 months, likely because it's now the most blocked AI crawler across every monitoring network.

AI CRAWLER SHARE OF BOT TRAFFIC (MAY 2024 → MAY 2025)

GPTBot (OpenAI)

+305% raw requests

Meta-ExternalAgent

near-zero → 19% share

PerplexityBot

+157,490% raw requests

Bytespider (ByteDance)

42% → 7% share

Source: Cloudflare, "From Googlebot to GPTBot" (June 2025). Bars are illustrative relative growth, not precise share.

Fastly summed it up in a 2025 analysis with a section headed "Robots.txt: A Suggestion, Not a Shield." Most AI crawlers respect it. PerplexityBot has been caught ignoring it. ClaudeBot respects it but doesn't publish IP ranges, so spoofing can't be detected by reverse DNS. All of them bypass GA4's IAB-list filter because none of them are on the IAB list.

The practical consequence: if you publish content, your "pageview" count is inflated by AI training crawlers rehydrating your pages into their models. GA4 tells you nothing about this.

Every major ad platform filters some portion of invalid traffic from your billing. None of them notify your analytics. A bot that clicked your Facebook Ad still lands in GA4 as a real session, inflates your visitor count, and crashes your conversion rate.

The industry benchmarks for invalid traffic on ad networks are specific enough to quote. Pixalate's Q3 2024 Global IVT Benchmarks, covering 100 billion programmatic impressions, reported:

Web programmatic: 14% IVT globally, 17% in the US and Canada
Mobile apps: 23%, up 30% year-over-year
CTV: 23%, up 44% year-over-year
Safari desktop: 30% IVT vs. 13% for Chrome

Lunio's 2026 IVT report, based on 2.7 billion clicks across unprotected monitor-only campaigns, breaks it down by ad platform:

INVALID TRAFFIC RATE BY AD PLATFORM (LUNIO 2026)

TikTok

24.2%

19.9%

X / Twitter

12.8%

Bing

10.3%

How to diagnose bot contamination without adding any new tool

If you're not ready to switch analytics, here's how to get a rough estimate of your bot share from what you already have. This isn't precise. It's directionally correct.

Step 1: Check your Direct share over 12 months. Go to Acquisition → Traffic acquisition in GA4. Set comparison to previous 12 months. If your Direct share grew more than 10 percentage points with no brand-marketing push, that growth is suspect. Some is dark social. Some is bots that stripped the referrer.

Step 2: Check mobile vs desktop split. Real human traffic is 55-65% mobile in most markets (per Statcounter's monthly GlobalStats). If your mobile share is under 35%, you're either selling to a desktop-heavy B2B audience (possible) or your desktop share is inflated by datacenter-originating bots. The skew tells you which.

Step 3: Filter sessions to duration under 2 seconds. Build a custom Exploration. Session duration < 2s. What sources do they come from? How does the share compare to sessions over 30 seconds? Large skew = bot traffic concentrated in a specific source.

Step 4: Cross-reference Direct growth with Search Console branded impressions. In Search Console, filter to branded queries over 12 months. If branded search is flat and Direct is growing, the growth isn't real interest. It's something else.

Use the calculator below for a rough estimate based on these signals.

BOT SHARE ESTIMATOR

Three inputs, one illustrative estimate. Based on our ClickHouse data and public benchmarks. This is a thinking tool, not a benchmark.

Based on Clickport internal data (median 20% across all sites), Pixalate 2024 IVT benchmarks, and Statcounter mobile share. Directional only.

Treat the output as a thinking tool, not a verdict. The actual share on your site depends on your content, your SEO visibility, whether you're being actively scraped, and your audience composition.

What actually works: multi-layer detection

No single detection layer catches every bot. The approach that works is stacking layers so each one catches what the others miss.

UA matching. Catches declared bots (Googlebot, Applebot, YandexBot). Matches substring patterns against the IAB list or an open-source alternative like isbot. Fails against any bot that spoofs a Chrome or Firefox UA. This is what GA4 does, and it's table stakes.

IP and ASN reputation. Catches bots running from cloud providers (AWS, GCP, Tencent Cloud, ColoCrossing). Maintains lists of known-bad IPs and entire datacenter CIDR ranges. Fails against residential proxy networks that route bot traffic through real consumer ISPs. OWASP explicitly notes IP reputation "should not be used as the sole or primary defense."

Browser fingerprinting. Catches headless Chrome and automation frameworks via signals the browser reveals: GPU renderer string, TLS handshake hash (JA3/JA4), canvas fingerprints, missing or inconsistent headers. Cloudflare's JS Detection Engine is public documentation of this approach. Fails against patched anti-detect browsers (Linken Sphere, Multilogin) that spoof realistic GPU strings.

Behavioral analysis. Catches bots via interaction patterns humans don't produce. Request cadence, mouse movement entropy, scroll velocity, time-between-events. Cloudflare's Anomaly Detection is user-agent-agnostic and scores each request against a domain's traffic baseline. Fails against slow bots that deliberately mimic human timing (one request per 30 seconds, with jitter).

DETECTION LAYERS AND WHAT EACH MISSES

Layer	Catches	Misses
UA matching	Declared bots with honest UAs	Any bot spoofing a browser UA
IP / ASN reputation	Datacenter bots (AWS, GCP, Tencent)	Residential proxy networks
Browser fingerprint	Headless Chrome, Puppeteer, Playwright	Patched real-browser automation
Behavioral velocity	Bot farms with abnormal cadence	Slow bots with jitter

Sources: Cloudflare Bot Management docs, OWASP Credential Stuffing Prevention.

The combinatorial math matters. A bot has to defeat all four layers to be counted as human. Each layer is bypassable on its own. Each one is cheap. Together, they compound: an attacker who defeats one layer is still caught by the next.

GA4 runs one of the four. That's why the 57% gap exists.

What Clickport catches that GA4 doesn't

During the 30-day study window, Clickport's detection ran four layers in order: UA matching, browser fingerprint (GPU and canvas), datacenter IP reputation, and behavioral velocity. (We tuned the behavioral velocity layer down partway through the window after it started producing false positives on a dominant real-user fingerprint; a redesigned version is in the works.) When we measured the output, 43% of bots were caught by UA alone. The other 57% required one of the deeper layers.

The breakdown:

UA pattern (what GA4 can also catch): 43.3% of our detections
Browser GPU fingerprint (SwiftShader, ANGLE renderer strings): 36.0%
Datacenter IP reputation (AWS, Tencent Cloud, ColoCrossing, etc.): 18.9%
Behavioral velocity (bot farms with abnormal cadence): 1.3%
Other fingerprint signals (no viewport, instant execution, arm64 Linux): <1%

WHAT DETECTION LAYER CAUGHT EACH BOT (CLICKPORT, 30 DAYS)

UA pattern (GA4 catches this)

43.3%

GPU fingerprint (GA4 misses)

36.0%

Datacenter IP (GA4 misses)

18.9%

Behavioral velocity (GA4 misses)

1.3%

Source: Clickport internal ClickHouse study, April 2026. 30-day window. Each blocked request attributed to the first layer that caught it. No double-counting.

Meaning: a GA4 user running the default bot filter sees only the green bar. Everything in red is counted as real human traffic in their dashboard.

The top specific categories we caught over 30 days: Applebot (1.58 million), generic Software GPU Renderer (1.54 million, headless Chrome), Apple Inc. ASN (869,000, with the caveat that some of this is iCloud Private Relay routing real humans), Googlebot (275,000), ANGLE SwiftShader (135,000).

One thing Clickport can't do: magically see bots that arrive with no signal at all. If a bot runs a real browser from a residential IP with human-like timing and a clean fingerprint, nobody catches it. The structural win is on the signal stack, not on magic. We run more layers than GA4 does, so we catch more of what GA4 misses. The bots past all four layers are still counted as humans everywhere, including here.

If your Direct bucket is growing and you want to see what's actually in it, try Clickport free for 30 days. One script tag, no credit card. Read how the bot detection works under the hood if you want the technical details before signing up.

Frequently asked questions

Why doesn't GA4 have a native AI bot channel in 2026?

Google has not said, specifically. The public record: Google classifies AI Overviews and AI Mode traffic as Organic Search, consistent with both being served through google.com. An AI-assistants regex example was added to the Custom Channel Groups documentation in July 2025, but it's user-configured, not a default channel. Building AI into defaults would require either a Google decision about which platforms count or a match rule on something beyond referrer domain.

Can I see which bots GA4 excluded?

No. Google's documentation states: "you cannot see how much known bot traffic was excluded." The data is dropped before processing. There's no bot report, no BigQuery export, and no audit of what was filtered. You get the post-filter number and nothing else.

What percentage of my website traffic is normal to have as bots?

Depending on site type and how much you're being scraped, anywhere from 2% to 50%+. Our internal data shows a median of 20% across sites with more than 100 human events per month. Industry network-level benchmarks from Imperva and Akamai land in the 37% to 51% range. If you're under 10%, you're on the clean end. If you're over 40%, something is actively scraping you.

How do I stop bots without blocking Google's crawlers?

Google publishes a verified list of its own crawlers along with IP ranges for verification via reverse DNS. Any bot filtering strategy should whitelist Googlebot, AdsBot-Google, and Googlebot-Image. Same for Bingbot (Microsoft publishes IP ranges). The trick is to block bad bots without hitting good bots, which is why IP reputation plus UA matching beats either signal alone.

Does a CAPTCHA stop bot traffic in GA4?

Not effectively in 2026. A 2023 USENIX paper from UC Irvine and Microsoft found that bots solve distorted-text CAPTCHAs in under a second with close to 100% accuracy. Humans take up to 15 seconds and finish correctly only 50-84% of the time. For image-based CAPTCHAs (reCAPTCHA grids, hCaptcha), bot and human solve times are roughly comparable. CAPTCHAs are a speed bump, not a shield. They reduce low-effort bot traffic. Advanced bots with ML-based solvers bypass them routinely.

Why do I have traffic from countries I don't sell to?

Three usual suspects. One, scrapers running from cloud infrastructure in cheap regions (Singapore, Brazil, Vietnam host a lot of low-cost compute). Two, referrer spam and scripted traffic targeting all domains indiscriminately. Three, residential proxies that rotate through consumer IPs across multiple countries. Filter your Explorations by country and cross-reference with your mobile/desktop split. Suspicious countries usually show up with desktop skew and very low engagement.

Does server-side tracking solve the bot problem?

No. Server-side tracking solves other problems (consent blocks, ad-blocker interference) but the bot request still reaches your server. If the bot's User-Agent matches a real browser and its IP is a residential proxy, your server-side event pipeline has the same trouble distinguishing it from a human as a client-side tag does. Bot detection has to run during request processing, not in the shipping layer.

One line to take away

Your analytics is counting things that aren't people. The filter you trust to handle that is checking user-agent strings against a paid list. More than half of actual bots don't show up on that list. The gap isn't small, and it's growing as AI crawlers, residential proxies, and headless browsers become cheaper to operate. Whatever number your dashboard shows for last month, the real number of humans is smaller. How much smaller depends on which signals you choose to look at.

David Karpik

Founder of Clickport Analytics

Building privacy-focused analytics for website owners who respect their visitors.

Comments

Loading comments...