Introduction
Prior public Google SERP mouse-tracking studies measured dozens of people. The largest had a few thousand tasks. This study analyzed queries from a panel of tens of thousands of Google Search users.
Eric Van Buskirk and his team at ClickStream Solutions previously conducted studies using desktop screen recordings and think-aloud protocols, commissioned by Kevin Indig, Citation Labs, and Propellic. Those studies asked participants to complete structured search tasks while narrating what they were doing, what they trusted, what confused them, and how they decided whether to continue searching, click a result, or rely on an AI-generated answer.
They were smaller and more qualitative, but they provided direct context for interpreting search behavior: why users paused, compared sources, returned to earlier results, or hesitated before acting.
This study is different. Instead of observing a smaller panel through task-based recordings, it analyzes clickstream data that includes cursor, viewport, and scroll behavior at scale across approximately 860,000 U.S.-based Google Search sessions. This study analyzed clickstream data provided by Surfer SEO from U.S.-based Google Search sessions in February and March 2026.
The earlier user-behavior studies helped shape interpretation of behaviors such as pausing, revisiting, and verification, but the percentages and findings reported in this article come from the clickstream analysis, not from the panel studies.
This page is written for researchers, journalists, and practitioners who want to go beyond the headline findings—understanding exactly how the data was collected, how sessions were classified, which statistical tests were applied, and where the study draws deliberate limits on its own claims.
846,000 search sessions
What our cursor position and scroll data do and do not reveal about how searchers navigate the SERP.
Data Source
Clickstream data sourcing is closely held and proprietary across the industry. Surfer’s data comes from websites, mobile apps, and/or browser extensions where users agreed to share anonymized browsing behavior.
Because Surfer’s audience is composed of marketing-interested people, the findings may skew toward more search-savvy users. Their sessions included both work and non-work-related Google Search use. Only 1.1% of sessions involved marketing-related queries, with the remaining 98.9% spread across other categories, so readers should weigh the findings against that caveat.
How Cursor Data Was Captured
Cursor positions were sampled at one-second intervals during search sessions, up to a maximum of 60 samples per session. Each sample captured cursor X-Y pixel coordinates, viewport X-Y coordinates, viewport dimensions, and a timestamp. This created spatial-temporal sequences that allowed us to measure how sessions moved through the search results page, how far they scrolled, when they paused, how often they returned upward, and whether an AI Overview was present.
Cursor tracking is a useful proxy for attention because cursor position typically aligns with gaze during active reading and decision tasks. It is less reliable during passive, distracted, or idle browsing. For that reason, this study treats cursor behavior as an attention signal, not as proof of exact eye fixation.
A “probe” refers to one sampled cursor and viewport observation. Session duration is approximated through the number and timing of probes, not necessarily the full browser-session duration. “Active” means a session continued producing cursor or viewport probes at that time point—not necessarily that the searcher was reading intensely every second.
Dataset Structure
The study used three analysis datasets:
- Balanced dataset — 74,848 sessions, designed for fair comparisons across search types and AI Overview conditions. Search-type and AIO groups were capped where possible. Informational and navigational groups reached 10,000 sessions per AIO condition; smaller groups such as local and video had lower available counts.
- Representative dataset — 99,994 sessions, preserving the natural mix of real searches, with informational queries accounting for the majority.
- Filtered representative dataset — 99,994 sessions drawn from 238,280 filtered sessions, excluding sessions with fewer than 3 or more than 25 cursor probes. This removed instant exits, incomplete captures, and sessions likely to reflect idle or distracted browsing.
The balanced dataset helps compare search types and AIO conditions on fairer footing. The representative dataset shows what happens in the natural mix of real searches. The filtered dataset shows what remains when unusually short or long sessions are removed.
Session Classification
Sessions were categorized across multiple dimensions:
- Search intent: informational, local, navigational, transactional, and video
- Language
- Topic or niche: travel, SaaS, shopping, health, finance, and other verticals
- SERP configuration
- Query type: branded versus non-branded
- Query length
- AI Overview presence or absence, measured where the hasSge flag was reliably captured
Search-intent classification is useful but imperfect. Some queries carry more than one intent, especially branded-commercial, local-transactional, or navigational-support queries. Branded and navigational searches are related but not identical: a user may search a brand name to reach a site, compare reviews, find a coupon, locate a login page, or make a purchase decision.
Behavioral Metrics
From raw coordinate sequences, we calculated indicators including scroll depth, typical cursor position, horizontal exploration, verification patterns, session duration, movement patterns, and reading signatures. Specific measures included maximum scroll depth, median cursor position, path length, relocation count, stationary ratio, and pause count.
Several metrics require careful interpretation:
- Back-scrolling — upward scrolling after a session has moved down the page, treated as a signal of revisiting or reconsideration.
- Stillness — periods when the cursor barely moved. Can indicate reading, thinking, hesitation, or inactivity. Interpretation is stronger when stillness appears alongside longer dwell time, broader viewport coverage, and more back-scrolling.
- Viewport coverage — how much of the visible page area was traversed by the cursor. Does not mean every element was read, but helps identify broader page interaction.
Statistical Methods
Because cursor and scroll behavior is noisy and often non-normal, the analysis used nonparametric tests throughout:
- Kruskal-Wallis tests compared behavior across multiple search-intent groups.
- Mann-Whitney U tests compared two-condition measures such as AIO versus non-AIO sessions.
- Chi-square comparisons and Cramér’s V were used for binary outcomes, such as whether a session reversed scroll direction at all.
Large datasets can make very small differences statistically significant. The study therefore emphasizes effect sizes, percentage patterns, and directional consistency across related measures rather than p-values alone. The strongest claims rest on repeated patterns across multiple behavioral measures—dwell time, cursor movement, scroll depth, stillness, and revisiting behavior—not on any single metric.
What This Study Does Not Claim
This study does not claim exact element-level fixation, such as “the user read Result #3.” Viewport coordinates confirm which vertical zones were on-screen, but we stop short of claiming users fixated on a specific element within that zone.
The study does not claim causation. The findings show correlations and behavioral shifts, not proof that AI Overviews caused those changes. AI Overview presence is associated with changes in behavior, but the study cannot prove the AIO alone caused every observed difference.
We did not attempt perfect classification of SERP features such as maps, product grids, video packs, knowledge panels, or sitelinks at scale. These findings should be interpreted as page-level behavior, not feature-by-feature causation.
Effect sizes fall within the typical range for behavioral research. Because individual search behavior varies widely, r values of 0.10–0.15 can represent meaningful and reliable patterns. Sessions vary in ways this study cannot fully measure, including task urgency, topic familiarity, device context, and how actively the person was reading.
The study is not claiming that lower-ranked results necessarily get more clicks. It is showing that, when AI Overviews are present, the search results page receives more on-page evaluation before the click. That is the methodological difference between this analysis and studies that rely only on rankings, CTR, zero-click rates, or AIO trigger frequency.
Ethics & Privacy
Data was anonymized prior to analysis with no personally identifiable information retained. Users are told that mouse movements may be captured.
