Clickport
Start free trial

Attribution Modeling Explained: Every Model, The Math, and What Actually Works in 2026

You're running three marketing channels. Blog content, a newsletter, and some paid ads. Someone signs up for your product. Your analytics dashboard says: "Source: Direct."

Direct. As in, they typed your URL into their browser. Except they didn't. They read your blog post two weeks ago, clicked a link in last Tuesday's newsletter, and then Googled your name this morning. Three touchpoints, three channels, and your analytics tool credits none of them.

This is the attribution problem. And 77% of marketers say they're not using the right attribution model or aren't sure which one they should use.

I'm David, founder of Clickport Analytics. I've spent the last two years building a privacy-first analytics tool that tracks where visitors come from without cookies, and in the process I've learned that attribution is simultaneously the most important and most broken part of web analytics. Not because the concept is hard, but because the tools we rely on are increasingly blind.

This guide is everything I know about attribution modeling. Not the surface-level definitions you'll find elsewhere, but the actual math behind each model, why Google removed most of them, how consent banners are destroying your attribution data, and what actually works for businesses that don't have a data science team.

What is attribution modeling (and why most marketers get it wrong)

Attribution modeling is the set of rules that determines which marketing touchpoint gets credit for a conversion. A conversion can be anything you care about: a purchase, a signup, a form submission, a demo booking.

The concept is simple. The execution is where everything falls apart.

Here's why: a typical customer doesn't interact with your brand once. B2C buyers average 6-20 touchpoints before converting. B2B SaaS deals average 266 touchpoints and 2,879 impressions per closed deal, a 20% increase since 2023. And 70% of the B2B buying process happens in the "dark funnel," meaning anonymous research before any contact with your company.

So when someone finally converts, which touchpoint deserves the credit? The first blog post that introduced them to your brand? The email that kept them engaged? The Google search that brought them back when they were ready to buy?

Different attribution models give different answers. And the model you choose directly affects where you spend your marketing budget.

Same journey, six different answers
A visitor discovers your product through 4 touchpoints, then converts. Each attribution model assigns credit differently.
Step 1
Blog post
Day 1
Step 2
Email
Day 5
Step 3
Social ad
Day 12
Step 4
Google search
Day 14 ✓
First-touch
100%
Last-touch
100%
Linear
25%
25%
25%
25%
Position-based
40%
10%
10%
40%
Time-decay
10%
19%
29%
42%
Blog gets 100% credit or 10% credit depending on which model you pick. Same data, wildly different conclusions.

The model you choose isn't just an academic exercise. Companies without proper attribution misallocate up to 30% of their marketing budget. When blog content shows 10% credit under time-decay, it's easy to cut the content budget. When it shows 100% under first-touch, you double down. Same data, opposite decisions.

Every attribution model explained (with the actual math)

Most guides define attribution models in a sentence and move on. Here's the math that nobody shows you.

First-touch attribution

100% of credit goes to the first touchpoint that introduced the customer to your brand.

Credit = 100% to touchpoint #1, 0% to everything else

When it's useful: Understanding which channels drive initial awareness. If you're trying to grow your audience, first-touch tells you what's working at the top of the funnel. Particularly valuable for B2B companies with long sales cycles where you want to know what started the conversation.

When it's misleading: It completely ignores everything that happened between discovery and conversion. An email sequence that nurtured the lead for six months gets zero credit.

Last-touch attribution

100% of credit goes to the final touchpoint before conversion.

Credit = 100% to the last touchpoint, 0% to everything else

When it's useful: Optimizing bottom-of-funnel campaigns. Retargeting, branded PPC, and direct response ads thrive under last-touch because they're designed to close. 67% of B2B marketing teams still rely on last-touch, and for simple sales cycles (visitor clicks ad, buys product) it's perfectly adequate.

When it's misleading: It systematically undervalues every channel that builds awareness. The podcast sponsorship that planted the seed six months ago? Zero credit. The blog post that educated the buyer? Zero credit.

Linear attribution

Equal credit distributed across every touchpoint in the journey.

Credit per touchpoint = 100% / N (where N = total touchpoints)

Four touchpoints? Each gets 25%. Ten touchpoints? Each gets 10%. Simple.

When it's useful: When you genuinely don't know which touchpoints matter more and want a balanced view. Good baseline for companies just starting with attribution.

When it's misleading: It treats a casual social media impression the same as the demo call that closed the deal. Not all touchpoints are created equal.

Time-decay attribution

Touchpoints closer to conversion get exponentially more credit.

The standard formula uses a half-life (Google defaulted to 7 days):

Credit weight = 2^(-t / halflife)

Where t = days between the touchpoint and conversion.

Worked example with a 7-day half-life and a conversion on day 14:

All raw scores are normalized to sum to 100%.

When it's useful: Short sales cycles where recent interactions are genuinely more influential. E-commerce during sales events. Promotional campaigns with clear deadlines.

When it's misleading: It penalizes brand-building activities. For B2B companies with 6-month sales cycles, the awareness touchpoints that started the relationship get almost no credit.

Position-based (U-shaped) attribution

40% to the first touchpoint, 40% to the last touchpoint, and the remaining 20% split equally among everything in between.

With 5 touchpoints: 40% / 6.67% / 6.67% / 6.67% / 40%

A variant called W-shaped adds a third anchor point (typically lead creation): 30% / 30% / 30% for the three key moments, with the remaining 10% split among middle touchpoints.

When it's useful: B2B marketing where both awareness (first touch) and conversion (last touch) matter. It acknowledges the full journey without treating all middle touches equally.

When it's misleading: The 40/20/40 split is arbitrary. There's no data proving that first and last touches are equally important, or that they each deserve exactly 40%.

Data-driven attribution (DDA)

Machine learning analyzes your actual conversion paths and assigns credit based on each touchpoint's measured contribution. Two main algorithmic approaches exist:

Shapley Values (used by Google): Calculate each channel's average marginal contribution across all possible permutations of channels. Originated by Nobel laureate Lloyd Shapley in 1951. Requires significant conversion volume (10,000+ converting journeys) for statistical reliability.

Markov Chains: Model the customer journey as states with transition probabilities. Uses the "removal effect": remove a channel from all paths, measure how much overall conversion probability drops. Those removal effects are normalized to 100%.

When it's useful: When you have enough conversion data. It's the most accurate approach because it's based on your actual customer journeys, not arbitrary rules.

When it's misleading: It requires significant conversion volume. And as we'll see in the next section, Google's implementation has a serious transparency problem.

Which model is right for you?
First-touch
"What brings people in the door?"
Brand awareness focus
Last-touch
"What closes the deal?"
Revenue optimization
Linear
"Give me a balanced view"
Starting with attribution
Time-decay
"Recent interactions matter more"
Short sales cycles
Position-based
"Value the start and the finish"
B2B with long cycles
Data-driven
"Let the data decide"
High conversion volume

Google removed 4 attribution models. Here's why that matters.

In September 2023, Google fully sunset first-click, linear, time-decay, and position-based attribution in both GA4 and Google Ads. Any conversion actions still using these models were automatically switched to data-driven attribution.

Google's rationale: these rule-based models are "no longer accurate, flexible, or able to keep up with today's complex buying journeys."

The reality is simpler. Fewer than 3% of conversions in Google Ads used these models. Most advertisers were already on data-driven or last-click. Google removed the options that almost nobody was using.

But that "almost nobody" includes marketers who specifically chose those models because they were transparent and predictable. A linear model always splits credit equally. A time-decay model always uses the same formula. You can audit the math. You can explain it to your CEO.

Data-driven attribution? You can't audit it, because it's a black box. And that's a problem.

GA4 attribution models: before and after
Before Sept 2023
✓ First-click
✓ Linear
✓ Time-decay
✓ Position-based
✓ Last-click
✓ Data-driven
6 models. Your choice.
After Sept 2023
First-click
Linear
Time-decay
Position-based
✓ Last-click
✓ Data-driven (default)
2 models. Take it or leave it.

To make things even more confusing, GA4 also renamed "conversions" to "key events" in March 2024. The underlying mechanism is identical, but now GA4 has "key events" while Google Ads has "conversions," and the numbers between the two don't match because Google Ads applies its own conversion modeling on top.

Why GA4's data-driven attribution is a black box

GA4's data-driven attribution (DDA) uses machine learning to analyze your conversion paths and assign credit. In theory, this should be the most accurate model. In practice, there are four problems nobody talks about.

Problem 1: It silently falls back to last-click. DDA requires at least 400 conversions per month to function. If your property doesn't meet this threshold, GA4 silently switches to last-click attribution without telling you. The settings page still shows "data-driven" as selected, even when it's not being used. Most small and mid-size businesses are unknowingly running on last-click while thinking they have sophisticated AI-powered attribution.

Problem 2: You can't audit the algorithm. DDA is a proprietary black box. You can see outputs (channel X got 30% credit, channel Y got 70%) but you cannot examine the inputs, weights, or logic. There is no way to verify why one channel got more credit than another. You're trusting the algorithm completely.

Problem 3: Google is judging its own ad platform. Google controls both the DDA algorithm and the Google Ads platform that receives credit from it. This is a structural conflict of interest. Research shows that 68% of multi-touch attribution models over-credit digital paid channels. Switching from last-click to DDA consistently reduces email channel credit while increasing Google Ads credit.

Problem 4: The lookback window is capped at 90 days. GA4's maximum attribution lookback window is 90 days (default is only 30 days). Any touchpoint older than 90 days is excluded entirely. For B2B companies with sales cycles of 6-12 months, this means the awareness touchpoints that started the relationship simply don't exist in the model.

What GA4 reports vs. what actually happened
WooCommerce store spending $10K/month on Google Ads and $2K/month on email marketing.
GA4 data-driven attribution
Google Ads revenue: $50,000 (5x ROAS)
Email revenue: $8,000
Transparent attribution
Google Ads revenue: $35,000 (3.5x ROAS)
Email revenue: $23,000
$15,000/month in misattributed revenue. Source: Seresa

In another case documented by Recast, a naive switch to Google's DDA model led to $40,000 in wasted ad spend that had to be refunded to the client. DDA was over-crediting Google Ads campaigns that weren't driving incremental conversions.

The bottom line: data-driven attribution is the best model in theory. But when the algorithm is opaque, controlled by an ad platform with a financial interest in the results, and silently degrades to last-click without telling you, "data-driven" becomes a marketing term rather than a meaningful methodology.

Here's the attribution problem that overshadows everything else: if your analytics tool uses cookies, and you're legally required to ask for consent, most of your visitors are invisible.

The numbers are stark. In GDPR-compliant Europe, 60-70% of visitors reject cookies when given proper "Reject all" buttons. In Germany, rejection rates hit 87%. In France, 73%. The etracker Consent Benchmark 2025 found an average 60% data loss from compliant consent banners.

The most striking example: the UK's Information Commissioner's Office, the privacy regulator itself, lost 90.8% of tracked visitors after implementing a compliant consent banner. Their daily tracked users dropped from 119,417 to 10,967.

When 60-90% of your visitors are invisible to your analytics, every attribution model breaks. It doesn't matter whether you're using first-touch, last-touch, linear, or data-driven. You're building attribution on a fraction of your actual traffic.

GA4's response is behavioral modeling: machine learning that guesses the behavior of users who declined consent based on those who accepted. It works with "70-80% accuracy" when you have enough traffic (1,000+ daily users in each consent category). But GA4 mixes modeled conversions with real ones, and you cannot tell which is which. You're basing budget decisions on a blend of real data and algorithmic guesses.

Visitors your analytics actually sees
GA4 with consent banner
30-40%
GA4 + behavioral modeling
~65% (modeled)
Cookieless analytics
100%
Sources: etracker Consent Benchmark 2025, Plausible

On top of consent banners, privacy browsers are actively breaking attribution. Safari's Intelligent Tracking Prevention caps JavaScript-set cookies to 7 days. Firefox's Enhanced Tracking Protection strips tracking parameters and blocks third-party cookies. Brave blocks analytics scripts entirely.

Combined, Safari (16.7%), Firefox (4%), and Brave (1%+) represent roughly 20-25% of web traffic with significant tracking restrictions. And those users skew toward privacy-conscious, higher-income demographics. You're not missing random visitors. You're missing a specific, valuable segment.

Cookieless analytics tools like Clickport, Plausible, and Fathom eliminate this problem entirely. No cookies means no consent banner needed, which means 100% of visitors are tracked. Last-touch attribution on 100% of your traffic is more useful than data-driven attribution on 30% of your traffic.

(For a deeper look at how cookie-free analytics works and what the EU regulations actually say, I've written dedicated guides on both.)

UTM parameters: the last attribution signal that actually works

In a world where cookies expire, consent banners block tracking, and browsers strip click IDs, UTM parameters are one of the few attribution mechanisms that survive everything.

UTM parameters are tags you add to URLs to tell your analytics tool where traffic came from. When someone clicks a link with ?utm_source=newsletter&utm_medium=email&utm_campaign=march-2026, your analytics tool captures exactly which campaign drove that visit.

Why do UTMs survive when other tracking breaks?

Because Apple's Safari 26 Link Tracking Protection (September 2025) targets user-identifiable click IDs like gclid, fbclid, and msclkid, stripping them from URLs across all browsing sessions. But it explicitly preserves campaign-style parameters like utm_source, utm_medium, and utm_campaign. UTMs describe the campaign, not the user. No privacy law restricts them.

The five parameters

utm_source (required): WHO sent the traffic. The specific platform or publisher: google, linkedin, newsletter, partner-blog.

utm_medium (required): HOW it arrived. The channel type: cpc, email, social, paid_social, referral, affiliate. This is the parameter that drives channel classification in GA4. Wrong medium = wrong channel group = "Unassigned" traffic.

utm_campaign (required): WHAT initiative. The campaign name: black-friday-2026, product-launch-q1. Keep this constant across channels to compare performance by source within the same campaign.

utm_term (optional): WHICH keyword or subject. Originally for paid search keywords, now also used for email subject lines and audience segments.

utm_content (optional): WHICH variation. Differentiates creative versions, placements, or CTAs: header-cta, sidebar-banner, blue-button.

Good vs. bad UTM tagging

UTM examples: what breaks vs. what works
Newsletter campaign
?utm_source=Email&utm_medium=Newsletter&utm_campaign=March Webinar
Mixed case, wrong medium value, space in campaign
?utm_source=newsletter&utm_medium=email&utm_campaign=webinar-mar-2026&utm_content=header-cta
LinkedIn ad
?utm_source=LinkedIn&utm_medium=social&utm_campaign=lead_gen
Capitalized source, "social" doesn't distinguish paid from organic in GA4
?utm_source=linkedin&utm_medium=paid_social&utm_campaign=lead-gen-q1-2026&utm_content=carousel
Internal banner on your own site
?utm_source=homepage&utm_medium=banner&utm_campaign=spring-sale
UTMs on internal links reset the session and overwrite original attribution
No UTMs. Track internal clicks with event tracking instead.
Teams that standardize UTM naming see a 29% improvement in attribution accuracy (CXL Institute, 2025)

The most common mistakes that break UTM attribution:

  1. Inconsistent capitalization. GA4 treats UTM values as case-sensitive. "Facebook", "facebook", and "fb" create three separate sources.
  2. Tagging internal links. UTMs on links between your own pages reset the session and overwrite the original source. A visitor from a Google ad clicks an internal banner with UTMs, and the conversion gets attributed to "homepage-banner" instead of the ad.
  3. Missing parameters. Tagging paid ads but forgetting UTMs on newsletter links, social posts, or podcast descriptions creates blind spots.
  4. Redirect chains dropping parameters. 301 and 302 redirects don't preserve query parameters by default. Every redirect between click and landing page is a chance to lose UTMs.

64% of companies have no documented UTM naming convention, resulting in an average 22% data loss. The fix is simple: pick a convention, enforce lowercase everything, and use a shared spreadsheet as your single source of truth.

Dark traffic: the visitors your analytics can't see

Even with perfect UTM tagging and a cookieless analytics tool, a significant portion of your traffic shows up as "Direct" when it isn't. This is dark traffic.

The most famous proof: Groupon deindexed itself from Google for 6 hours and watched "Direct" traffic to their long-URL pages drop by 60%. That traffic was organic search being misattributed as Direct all along.

SparkToro tested 1,113 visits across 16 pages on 11 platforms and found that referral data is stripped almost everywhere:

This is dark social: links shared in messaging apps, email clients, and private channels that strip the referrer header before the click reaches your site. RadiumOne's research found that over 80% of global content sharing happens in these private channels, not on public social media.

The practical impact: your "Direct" traffic bucket is a junk drawer. It contains real direct visits (someone typing your URL), but also misattributed organic search, email clicks that stripped the referrer, social shares in private messages, in-app browser visits, and increasingly, AI search clicks.

What "Direct" traffic actually contains
1
Real direct visits (typed URL, bookmarks)
~40%
2
Dark social (WhatsApp, Slack, DMs, SMS)
~25%
3
Stripped referrers (HTTPS, in-app browsers, redirects)
~15%
4
AI search (ChatGPT free tier, Gemini, Copilot)
~10%
5
Email clients (Outlook, Apple Mail, Gmail app)
~10%
Percentages are approximate and vary by site. Sources: Search Engine Land, SparkToro

No attribution model can solve dark traffic entirely. But understanding that your "Direct" bucket is inflated changes how you interpret every other channel. If 60% of your Direct traffic is actually from other sources, your organic search, social, and email channels are performing better than your dashboard shows.

AI search traffic: the attribution blind spot nobody's talking about

AI search engines account for an estimated 12-18% of total referral traffic as of Q1 2026, up from 5-8% in late 2024. AI referral traffic is growing 130-150% year-over-year. And it converts better than traditional organic: 14.2% conversion rate vs Google's 2.8%.

This is likely your highest-converting traffic source. And most analytics tools can't see it.

The problem is that each AI tool handles referrer data differently:

How AI search tools send (or don't send) referrer data
ChatGPT (paid)
Sends referrer + utm_source
→ GA4: Referral
ChatGPT (free)
No referrer data sent
→ GA4: Direct
Perplexity
Sends referrer, no UTMs
→ GA4: Referral
Google Gemini
Inconsistent referrer handling
→ GA4: Direct or Referral
Google AI Mode
Initially mislabeled as Direct
→ GA4: Direct (mostly)
In real-world data from 446K visits, 70.6% of AI traffic landed as "Direct" in GA4

The core issue: GA4 has no default "AI Search" channel. All AI traffic lands in "Referral" alongside regular website referrals, or worse, in "Direct" when referrer data is missing. Users must manually create custom channel groups with regex matching for AI domains. Most never do this.

Google's own AI Mode traffic was initially untrackable in GA4. When Google can't properly attribute traffic from its own AI product in its own analytics tool, that tells you something about the state of AI search attribution.

Clickport has a dedicated AI Search channel built into its 16-channel classification system. Traffic from ChatGPT, Perplexity, Gemini, Copilot, and other AI search engines is automatically identified and grouped, no manual regex configuration needed. (I wrote a deeper analysis of why AI visibility scores are misleading if you want to understand the broader AI search landscape.)

The small business attribution stack (that costs $0-9/month)

Here's what most attribution content won't tell you: if you're a small business with 2-5 marketing channels and under 100K monthly pageviews, you don't need multi-touch attribution. You need three things that cost almost nothing.

Only 22% of marketers believe they're using the right attribution model. But the problem for most small businesses isn't that they picked the wrong model. It's that they're trying to solve an enterprise problem with enterprise tools when a simpler approach would work better.

Layer 1: Cookieless analytics with source tracking ($0-9/month)

Any privacy-first analytics tool automatically captures referrer data, UTM parameters, and channel information. Because there's no consent banner, you capture data from 100% of visitors, not the 30-40% who accept cookies.

This alone gives you last-touch attribution on every visit. For most B2C businesses where the journey happens in a single session, last-touch on 100% of traffic is more accurate than data-driven attribution on 30% of traffic.

Layer 2: UTM-tagged links on everything you control ($0)

Newsletter links, social posts, bio links, partner links, podcast descriptions. A shared Google Sheet with your UTM naming convention acts as your central source of truth. Three rules: always lowercase, use hyphens between words, include a date identifier on time-bound campaigns.

Layer 3: "How did you hear about us?" ($0)

A single dropdown or text field on your signup form or checkout page. Post-purchase surveys get 60%+ response rates and capture what no analytics tool can see: podcast mentions, word-of-mouth recommendations, conference conversations, private messages from friends.

This is your only window into dark social, which accounts for over 80% of content sharing.

The 3-layer attribution stack
Layer 1: Analytics with source tracking $0-9/mo
Captures: referrer, UTMs, channel, landing page
Sees 100% of traffic (cookieless) vs 30-40% (cookie-based with consent)
Layer 2: UTM spreadsheet $0
Captures: campaign source, medium, and variation for every link you create
Standardized naming = 29% attribution accuracy improvement
Layer 3: "How did you hear about us?" $0
Captures: word-of-mouth, podcasts, dark social, offline events
60%+ response rate. The only way to see what analytics can't track.
Combined, these three layers cover the vast majority of attribution signals a small business needs.

Compare this to enterprise attribution tools that cost $50,000-200,000+ per year. For a business with a few thousand monthly visitors and straightforward marketing, the three-layer stack gives you 90% of the insight at 1% of the cost.

A practical tip: run both first-touch and last-touch views side by side. If they agree on your top channel, you have high confidence. If they disagree, you're learning something valuable about your funnel without any complex modeling.

How to set up attribution that works today

Here's the practical playbook. No enterprise tools, no data science team, no six-figure budget.

Step 1: Tag every link you control with UTMs. Newsletter links, social posts, bio links, partner referrals, podcast show notes, QR codes on print materials. Use a shared spreadsheet to enforce naming conventions. Always lowercase, hyphens between words, include dates on time-bound campaigns.

Step 2: Pick a cookieless analytics tool. This gives you source and channel data on 100% of your visitors without consent banners. Clickport, Plausible, and Fathom all work. The key advantage: no data lost to cookie rejection, no behavioral modeling mixing guesses with real data.

Step 3: Set up goals for your conversions. Define what a conversion means for your business: signup, purchase, form submission, demo booking. Without goals, attribution has nothing to attribute to. Clickport supports goal tracking for pageviews, clicks, form submissions, and custom events.

Step 4: Add "How did you hear about us?" to your signup or checkout flow. Keep it simple. A dropdown with your main channels plus an "Other" text field. This is your only attribution signal for dark social, word-of-mouth, and offline touchpoints.

Step 5: Review your sources weekly. Look at which channels bring visitors and which channels drive conversions. Compare first-touch (entry pages) and last-touch (converting session source) to understand your funnel. If a channel brings lots of first visits but few conversions, it's an awareness channel. If it brings few first visits but high conversion rates, it's a closing channel. Both are valuable.

Step 6: Clean up "Direct" traffic. If Direct is your largest source, something is wrong. Check that all your email links have UTMs, that your social bio links are tagged, and that redirects aren't stripping parameters. A sudden spike in Direct traffic after a campaign launch usually means your campaign links are broken.

Attribution doesn't require sophisticated models. It requires consistent tagging, a tool that sees all your visitors, and the discipline to review the data regularly. The companies that get attribution right aren't the ones with the most advanced algorithms. They're the ones that tag every link and actually look at the results.

If you want to see what your attribution data looks like when measured from 100% of your traffic with automatic channel classification (including AI Search), try Clickport free for 30 days. No credit card required.

David Karpik

David Karpik

Founder of Clickport Analytics
Building privacy-focused analytics for website owners who respect their visitors.

Comments

Loading comments...

Leave a comment