Websites in “Your Money Your Life” (YMYL) categories saw a roller-coaster ride in the SERPs over the last two years. One way of understanding the changes is that Google tries to be more cautious in ranking pages that are related to personal health. This might be especially true if the topics lack wide consensus, are controversial or have an out-sized impact on personal health choices. Google judges on-site content, in part, via their machine learning and natural language processing (NLP) algorithms.
To investigate, we compared web pages that cover health-sensitive topics vs. non-health pages, those that do not fall into the now-infamous YMYL category. ClickStream partnered with Surfer SEO for data used in this study.
Here are the most important findings from the study:
- Google does indeed dial-up their sensitivity meter for on-page factors when returning results about health-sensitive topics.
- To rank high in Google, YMYL pages need more on-page thorough, wholistic writing vs. pages about topics that are not key to personal well-being.
- A high rating from Surfer’s Content Score is a better predictor of high rankings than any independent, on-page rank signal.
- Covering a topic using related topics to the parent keyword is the largest on-page ranking factor. It’s more important than Title and H1. However, our one caveat here is we focused on keywords with low— medium monthly search volume.
- Keywords with informational intent require better on-page optimization than those with buying intent.
The other big rank signal I was interested in is comprehensive content coverage (aka thorough or holistic writing about the primary topic or keyword) on a page. This was possible to measure because Surfer SEO is one of the best two tools on the market to score NLP-related topic use on pages. They do this by backward analyzing related words and phrases on top ranking pages in SERP (the other software tool is MarketMuse, primarily used for enterprise SEO, which relies on its own machine learning algorithm for natural language processing. I’ve worked extensively with their team and data in the past).
With the launch of the Hummingbird algorithm six years ago, Google ramped up its machine learning to grade how well a topic is covered on a website or page.
Data studies over the past few years were less than elegant in showing the importance of topic coverage or more limited, even silly, measures that mean little or nothing. I’m thinking in particular about discussions of keyword density, word count of a page, or LSI keywords.
Greg Mercer of JungleScout gave us access to their best-of-class database of eCommerce keywords. Their keywords are grouped and siloed with natural language processing (NLP). Our study analyzes specific product niches.
I worked with interactive visualizations for interpreting and presenting. I hope to break new ground in the SEO space in the use of Tableau, the world’s most widely used and sophisticated data visualization tool. Visualizing relationships between 1000s of data points creates a higher level of data analysis. All of the visuals here can be filtered and sorted to help find meaningful relationships.
We looked at the top 30 results on Google SERP for 8,460 keywords. That resulted in over 200,000 ranking URLs. We analyzed a randomized 190,824 final SERP results.
Most on-page rank studies require the development or re-use of a custom crawler to gather webpage data specifically for “studies.” Surfer’s crawler and algorithm, however, analyze the information it collects from webpages and SERPs hundreds of times a day. This is at the heart of their tool’s secret sauce and success. In a sense, every time Surfer SEO’s public-facing tool analyzes SERPs for a user’s keyword submission, they are doing a mini rank study on the top results for that keyword.
The keywords came from NLP related search terms related to:
- board games
- musical instruments
- outdoor lighting
- patio seating
- pen scanners
- ski bags
- dangerous/prohibited/banned products from Amazon
CBD and vape keywords are banned from Google ads. Muscle building and weight loss are considered by the FDA and others as two of the riskiest health-related categories on Amazon. We chose the other categories because they are near poster-children of innocuous niches.
Group Total URLs
Most SEOs don’t work for the top 50 largest websites, and we want results to help everyone. We chose keywords with monthly searches mostly below 1000 searches. These are siloed keywords from niches. The narrow niches also let us cluster topics that are very much not YMYL vs. those that are. Lastly, niche clusters let us look for new sector analysis techniques for SEO (see the section “Adding multiple-variable filters.. below for more on that).
The “Dangerous/Prohibited/Banned” keywords were pulled and appended manually from Amazon’s very lengthy list on their Seller Central page.
Results and Actionable Ansights
This discussion is limited to rank signals that showed meaningful impact on high ranking.
YMYL high ranking pages are better optimized for comprehensive content vs. non-health pages
The differences are not large, but we wouldn’t expect them to be. Have a look a the charts and correlations.
When we look at on-page rank signals there are many other factors that interfere with what we are trying to measure. For example, in link studies, SEOs would love to isolate how different types of anchor texts performs, but unless you are Matt Diggity back when he owned over 500 websites, you don’t have enough controls over what affects minor differences among anchor text variables.
The “banned, hazardous, prohibited” pages were even more sensitive to on-page optimization than the non-health-related group. Most of the “banned” keywords are very long-term and often esoteric products. Our chart for the average ranking keyword and average estimated traffic shows the category has very small websites ranking for the keywords. They have less authority and power. Looking at them next to the other groups is not an apples-to-apples comparison— keep that in mind. Remember, the biggest rank factor that can be isolated is domain authority. That “signal” outweighs topic coverage.
Below are the correlations we found relevant at the .05% or higher level. The metrics for external links and estimated site traffic are very approximate and provided by Surfer SEO. Since they are blunt measurements, their power in showing results will be weaker when in this study.
The visualizations are interactive: hover your mouse over text and elements.
Surfer’s Content score is the best predictor of high ranking
It’s not surprising that Surfer’s proprietary “Content Score” was the best predictor of high ranking vs. any single on-page factor we looked at in our study.
Surfer describes the score as “Multiple factors weighted to represent how well a piece of content is optimized. It is based on length, structure, entity coverage, exact keywords in use, and the visual aspect – images.” So, it’s actually an amalgamation of many rank factors. Clearly, they built their scoring system in a very meaningful, useful way.
The number of domain-ranked keywords and the domain’s estimated monthly traffic affect how a URL ranks. A lot.
These are a measure of domain authority, often represent as DA (Moz) or DR (Ahrefs). Most SEOs agree that Expertise/Authority/Trust, EAT, are the main factors Google uses for YMYL ranking. Both ranked keywords and estimated traffic are critical ways to find EAT for a domain.
They show the success of the website, but not the page. Looking at these factors on a page level would give more insights, but remember the focus of this study is on-page factors. These had the strongest relationship, with strong correlations of .170 and .177, respectively. So, just having a page on a larger website predicts better ranking the most. It’s one of the first things SEOs learn: don’t try to go after parent topics and competitive keywords where authority sites have total dominance of the SERPs.
Five years ago, when most SEOs weren’t paying attention to topic coverage, the best way to create keyword maps or plans was using the “if they can rank, we can rank” technique. It’s still a super important strategy when used alongside topic modeling. It relies very heavily on being sure the competitor sites analyzed have similar authority and trust:
- Find the 4-8ish “competitive” or “overlapping by content” domains that best match your domain for subject coverage, current traffic amount coming from search engines, and link strength.
- Create keyword Maps for pages to rank for similar or same parent keywords and related long-tails.
Thorough use of common words & common phrases has the biggest effect on ranking for on-page signals
“Common words and…” is Surfer’s way of saying “comprehensive topic coverage,” and as expected this had the largest on-page, single ranking factor correlation score, 0.149 (excluding the content score since it actually several factors combined.) Surfer determines important words commonly used by competitors in top SERP positions for a keyword. This factor checks how many of these common terms a piece of content is missing.
If you’re not paying attention to natural language processing (NLP) aka topic modeling, aka semantic SEO, you’re six years late to the Hummingbird game! And, you are two years late to the sub-algorithm of Hummingbird: BERT.
The BERT Algorithm (Bidirectional Encoder Representations from Transformers) is a neural machine translation system developed by Google that performs word-level training and uses a bidirectional LSTM with attention to learn representations of words. It’s particularly important in helping Google understand the meaning of user’s queries.
To rank high for informational searches, content must be better-optimized vs. content ranking for buying intent
I created two groups for user intent query types. This is another test we’ve not seen done at scale.
For buyer intent, “For sale” was appended to the end of search terms, and “buy” to the front of other terms. That was done randomly to half of all keywords in the study. The other half had “how to use” appended to the beginning.
There was a clear difference between the two groups. To rank higher on informational terms, you need better optimized on-page writing!
There is, however, a fairly simple explanation for that. Google knows users don’t want to see too much text on an eCommerce page. If they are ready to buy, they’ve typically done some due diligence on what to buy. They completed most of their customer journey. eCommerce sites use more complex frameworks, and Google can tell a lot about buyer user experience by technical SEO page factors that are not as important on informational pages.
Also, for sites with more than a handful of products, category pages tend to have the more thorough content that both users and Google look for before diving deeper.
Website speed and high ranking pages
Google created a lot of hoopla when it announced last November:
“Page experience signals [will] be included in Google Search ranking. These signals measure how users perceive the experience of interacting with a web page and contribute to our ongoing work to ensure people get the most helpful and enjoyable experiences from the web…the page experience signals in ranking will roll out in May 2021.”
Surfer measure four site speed factors and they do play a part in “page experience”:
- Html size (in bytes)
- Page speed time to first byte
- Load time in milliseconds
- Page size in kilobytes
I found an overall correlation of 0.10 for HTML size, but other factors affecting page speed did not correlate to high ranking. So, larger HTML size seems counter-intuitive at first, as this means LARGER sizes tend to rank higher, and the correlation was fairly strong. I also found a very large disparity between informational vs. buyer intent: sites that bring up product pages tend to have larger HTML sizes. The correlation was 0.045 for informational and 0.146 for purchase intent.
This may show the complexity of a content management system is connected with a higher SERP ranking. eCommerce platforms are much more code-heavy and complex compared to platforms like WordPress. This is yet one more time there is an “indirect connection” with what we think we are measuring in SEO but are not.
Four years ago, I oversaw the first study to show Google measures page speed factors other than time to first byte (I did the study for everyone’s favorite digital marketer, Neil Patel). Since then, others also found even bigger effects in higher ranking by having sites that are fast in other areas like “time to first paintful” or “time to first interactive.”
Michael Suski, Surfer SEO’s Co-Founder (who was a huge help in getting me data and understand their measurements!) has seen this kind of result recently, and other research also shows little to no connection between better ranking sites and site speed. He thinks it’s because over the past several years websites have “caught up” with some of the many technical issues that plagued the laggards. This actually makes complete sense to me, especially when I think of the technical SEO I did on version 1.x Magento e-commerce sites 4+ year ago. Some where like train wrecks before complex audits and fixes. Today that’s not the case.
It’s it little odd that Google is drawing so much attention to “page experience” rank signals rolling out later in 2021. They’ve been measuring this UX for a long time, with load times being one of the easier ways for SEOs to see effects vs. a UX factor like time on page or “pogo-sticking” (when people bounce back to the SERP after clicking on a result, which would seem to indicate they weren’t fully satisfied with a result).
It seems that even with the big speed improvements, users demand even more, especially because mobile speeds are often slower. Google pre-announced this, in my mind, also because it’s not a signal that can be “gamed” easily. Fast is fast.
Adding Multiple-Variable Filters To See Results Beyond Two Dimensions
The data viz below let you adjust sliders and click on data points to better understand how relationships in the study are multi-dimensional. Google has are over 200 rank signals. Could someone build a machine to dial-up or down each signal to better predict how to reach their SEO goals? Nope. However, adding at least a third dimension to our most important results makes for some very interesting, granular analysis. Further, I created interactive data viz by category as a use case.
Using Data Visualizations for Niche and Industry Sector SEO
We’ve all seen metrics in SEO tools for “competitors” but they are static measures. What if you find all the main keywords you want your website to rank for, and then see how other domains perform on, for example, comprehensive topic coverage? You can download the data with CSVs and use data visualization apps like Tableau Public, which I use here.
Note how similar websites get similar content scores. SEOs love to see metrics about competitors like ranking keywords, estimated monthly traffic, and number of linking domains. These are all super important. Seeing the individual’s keyword terms closely related to a domain is how we judge “competitors.”
I’ve always been fascinated by taking that to the next level. How do we look at entire sectors based on the way keywords are “scored” by Google? Backward analyzing Google SERP gives those insights. Much of this analysis moves out of the SEO realm and into the business analysis realm. It’s also super useful for evaluating niches to invest in for affiliate or eComerce websites. Links might tell us a particular sector is overly competitive, but what about content quality? If we can evaluate large groups of keywords for different niches.
Hope you learned something new! We are planning out our next study and would love to get your feedback on Twitter.
Niches were chosen because I wanted domains to show up in our study with multiple URLs. To get a lot of eCommerce sites that are “specialty” oriented, as most non-mega e-commerce sites are, this was important. Most data studies do not look at how a group of URLs from one domain tell a story. The keywords they use are so randomized, that the mega websites have the vast majority of URLs in results.
The narrow topics also meant fewer keywords with extremes in ranking competition. Many rank studies use a preponderance of keywords with over 40,000 monthly searches, but most SEOs don’t work for websites that are able to rank top 10 for those. There is a bias in this study toward less competitive keywords in our study, and I didn’t look at Google keyword search volume, just the volume on Amazon. Our keywords did not have less than 10 monthly searches on Amazon per month (via JungleScout) and given that I appended words to them, their actual search volume in Google would be less than 10 a month in many cases.
The “dangerous/prohibited/banned” group was excluded from most comparisons of health vs. non-health. Many of these were very esoteric topics or Amazon needed 6-10 words to describe them.
We used the Spearman rank-order correlation to examine data. Spearman is a formula that calculates the correlation between two variables. The correlation is measured from -1 to 1. A correlation coefficient that is 1 or -1 would mean that there is a strong monotonic relationship between the two variables. The reason why Spearman’s is used instead of Pearson, is because of the nature of Google Search Results; they are ranked by importance, in decreasing order. Spearman’s correlation works by comparing the ranks of two datasets, which fits our goal better than Pearson. We used .05 as our level of correlation confidence. This is the same confidence level I used when overseeing the 1 Million Search Results data study for Brian Dean in 2016.
When I show a correlation of .08 it suggests a ranking signal that is twice as powerful as another ranking signal measure of .04. Greater than .05 is a positive correlation, less than .05 no correlation. Correlations range from .05 – .05. A negative correlation shows it is causing the direct variable number to go down.
Many of the domains in the study are from outlier or niche topics or are small because of little time and money put into them. That is first and foremost why they don’t rank well. That is also why we must look for “controls” that might show that two domains had the same amount of time, web dev/design savvy-ness, and money put into them, BUT they are, for example, health vs. non-health topics.
Correlation is not causation. While I don’t have a graduate degree in statistics, I did want to understand how we could “control” for some large factors to better pinpoint the effect of results. This was done with the graph visualizations, but not with statistical calculations. Google says they use over 200 ranking factors, so it is indeed very, very difficult to isolate independent variables. Correlations have been used for science for 100s of years where variables can’t be totally controlled. They have accepted science, and to say otherwise is a fools errand. Part of the problem with trust in SEO data studies is that many were poorly executed, so people doing SEO soured on all studies.
One study done by SEMrush after I left in 2004 claimed to use “random forest” machine learning and mysteriously found that type-in traffic is the biggest ranking factor. If type in traffic helps rank sites, then for sure the tail is wagging the dog and Google is liable for billions of dollars in court losses for breaking their Chrome TOS. Why would they ever look at the recognition of a site as the best way to verify good results? Does that mean type-in-traffic for “Enron” showed it was the best at what it did before it was busted as one of the most scandalous corporations of the 21st century? It sure was a very popular company before it went out of business, but popularity is not what helping users answer queries is about.