Thanks to Mike King and Rand Fishkin for the invaluable insights they shared on “Google Search’s Internal Engineering Documentation.”
I’ve added what seems important to me in the article. Take some time to read through their articles yourself, understand the information they’ve shared, and determine what’s important for you.
- Mike King Article: Secrets from the Algorithm: Google Search’s Internal Engineering Documentation Has Leaked
- Rand Fiskin Article: An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them
A huge thanks to Matt Hodson for creating the website on Google Search Algorithm Leak.
- Website Link: Google Search Algorithm Leak
Mike King’s Article Summary
- DA or DR: Google has a feature called “siteauthority” which is essentially the same concept.
- Sandbox: Sandbox is real and new websites stay in Sandbox for a period of time.
- Authors: Add a legit author bio in the articles.
- Click Signals: Google uses clicks and post-click behavior as part of its ranking algorithms.
- Make sure to just not focus on getting clicks, optimize the page/article to increase dwell time as well.
- Drive more successful clicks using a broader set of queries and earn more link diversity if you want to continue to rank.
- Date Of The “Last Good Click” To A Given Document:
- This suggests that content decay (or traffic loss over time) is also a function of a ranking page not driving the expected amount of clicks for its SERP position.
- Indexing Tier Impacts Link Value: Google store pages in three tiers. Get backlinks from pages that are either fresh (example: PR) or otherwise featured in the top tier.
- URL Last 20 Changes: Google store 20 versions of a page similar to Wayback Machine. This becomes one of important reason to redirect a page to relevant page to expect the link equity to flow.
- CrawlerChangerateUrlHistory:
- Google keeps the history of a page and consider comparing the page with its 20 latest versions.
- This also means changing only date of a blog post to trick Google that the page is fresh/refreshed will not work. 🙂
- CrawlerChangerateUrlHistory:
- Homepage PageRank: PageRank and siteAuthority act as proxies for new pages until their individual PageRanks are established.
- Font Size of Terms and Links Matters: Google tracks the average weighted font size of terms in documents and anchor text of links. Simply bold the important terms and most important link (1-2) on the page.
- Documents Get Truncated: Google uses tokens to track the words on a page body which has a max. limit. So it’s important to place the key content early in the article or page.
- Page Titles Are Still Measured Against Queries: The document indicates that there is a titlematchScore and Google values how well the page title matches the query. Means place your target keywords first in the title.
- Dates are Very Important: Consistency in specifying a date across structured data, page titles, and XML sitemaps is crucial for optimization.
- Site Embeddings Are Used to Measure How On-Topic A Page Is: The following module “QualityAuthorityTopicEmbeddingsVersionedItem” evaluates a website’s authority and relevance on a specific topic by analyzing its content and structure. Koray’s course on the concept is a very good option to learn how to build authority for the site in a niche.
- Anchor Mismatch: Google demotes links in its calculations when they do not match the target site they are linking to, indicating that relevance is sought on both ends of a link.
- SERP Demotion: A demotion signal based on SERP factors indicates potential user dissatisfaction with a page, often measured by its click-through rate.
- Nav Demotion: This is a demotion applied to pages exhibiting poor navigation practices or user experience issues.
Mike King’s Article: Important Sections
siteAuthority – (DA or DR)
- Google’s feature called “siteAuthority” is part of the Compressed Quality Signals and is used in the ranking system.
- We SEOs refer to ‘siteAuthority’ as ‘Domain Authority’ (DA from Moz) or ‘Domain Rating’ (DR from Ahrefs), which are essentially the same concept.
Sandbox
- In the PerDocData module, the documentation indicates an attribute called hostAge that is used specifically “to sandbox fresh spam in serving time.”
- This means the Sandbox is real and new websites stay in Sandbox for a period of time.
Google Chrome
- One of the modules related to page quality scores features a site-level measure of views from Chrome. Another module that seems to be related to the generation of site links has a Chrome-related attribute as well.
Authors
- Google explicitly stores the authors associated with a document as text.
- They also look to determine if an entity on the page is also the author of the page.
NavBoost and Glue – (Click Signals)
- One of Google’s Strongest Ranking Signals – NavBoost is a system that employs click-driven measures to boost, demote, or otherwise reinforce a ranking in Google Search.
Clicks
- Navboost uses a rolling 13 months of data and focuses on web search results. This means click logs are used to change web results.
- Navboost has a specific module entirely focused on click signals. The summary of that module defines it as “click and impression signals for Craps.” Craps is one of the ranking systems.
- The following are all considered as metrics under Craps.
- bad clicks
- good clicks
- last longest clicks
- unsquashed clicks
- unsquashed last longest clicks
- The following are all considered as metrics under Craps.
Date
- One measure is the Date of the “last good click.” This suggests that content decay (or traffic loss over time) is also a function of a ranking page not driving the expected number of clicks for its SERP position.
- The documentation represents users as voters, and their clicks are stored as their votes. The system counts the number of bad clicks and segments the data by country and device.
- They also store which result had the longest click during the session. So, it’s not enough to just perform the search and click the result, users need to also spend a significant amount of time on the page. Long clicks are a measure of the success of a search session just like dwell time, but there is no specific feature called “dwell time” in this documentation.
- Scoring on the subdomain, root domain, and URL level, inherently indicates they treat different levels of a site differently.
- Google uses clicks and post-click behavior as part of its ranking algorithms.
Indexing Tier Impacts Link Value
Content Hierarchy
- A metric called sourceType shows a loose relationship between – where a page is indexed and how valuable it is.
- Google’s index is structured/arranged into tiers – where the most important, regularly updated, and accessed content is stored in flash memory.
- Less important content is stored on solid-state drives, and irregularly updated content is stored on standard hard drives.
- This means the higher the tier, the more valuable the link.
- Pages that are considered “fresh” are also considered high quality.
- You want your links to come from pages that are either fresh or otherwise featured in the top tier.
- This partially explains why getting rankings from highly ranking pages and from news pages yields better ranking performance.
- Example: Press Release (PR)
Google Only Uses The Last 20 Changes For A Given URL When Analyzing Links
- Google’s file system is capable of storing versions of pages over time, similar to the Wayback Machine.
- As per Mike King, Google keeps what it has indexed forever.
- This is one of the reasons you can’t simply redirect a page to an irrelevant target and expect the link equity to flow.
- CrawlerChangerateUrlHistory
- The docs reinforce this idea implying that they keep all the changes they’ve ever seen for the page.
- When they do surface data for comparison by retrieving DocInfo, they only consider the 20 latest versions of the page.
Homepage PageRank
- Every document is associated with its homepage’s PageRank (the Nearest Seed version), likely serving as a proxy for new pages until they develop their own PageRank.
- Both this and siteAuthority probably act as proxies for new pages until their individual PageRanks are determined.
Homepage Trust
- Google evaluates the value of a link based on the trustworthiness of the homepage.
- Therefore, it’s important to focus on the quality and relevance of your links rather than their quantity.
Font Size of Terms and Links Matters
- Google is tracking the average weighted font size of terms in documents.
- They are doing the same for the anchor text of links.
Documents Get Truncated
- Google tracks the number of tokens and the ratio of total words to unique tokens in the body.
- The documentation states there’s a maximum number of tokens considered for a document in the Mustang System (primary scoring, ranking, and serving system), emphasizing the importance of placing key content early.
Short Content is Scored for Originality
- The OriginalContentScore suggests that short content is scored for its originality. This is probably why thin content is not always a function of length.
- Additionally, there is also a keyword stuffing score.
Page Titles Are Still Measured Against Queries 🙂
- The documentation indicates that there is a titlematchScore. The description suggests that Google still actively values how well the page title matches the query.
- Placing your target keywords first is still the move.
There Are No Character Counting Measures
- The dataset doesn’t include metrics for counting the length of page titles or snippets. The only character count measure is snippetPrefixCharCount, which helps determine what can be included in the snippet.
- This aligns with findings that long page titles are not ideal for attracting clicks but are acceptable for improving rankings.
Dates are Very Important
- Google is very focused on fresh results, and the documents illustrate its numerous attempts to associate dates with pages.
- bylineDate – This is the explicitly set date on the page.
- syntacticDate – This is an extracted date from the URL or in the title.
- semanticDate – This is the date derived from the content of the page.
- Your best here is specifying a date and being consistent with it across structured data, page titles, XML sitemaps.
- Putting dates in your URL that conflict with the dates in other places on the page will likely yield lower content performance.
Site Embeddings Are Used to Measure How On-Topic A Page Is
- QualityAuthorityTopicEmbeddingsVersionedItem
- This module evaluates a website’s authority and relevance on a specific topic by analyzing its content and structure to determine how focused it is on that topic and how consistent its pages are with that topic.
Demotions
- Anchor Mismatch
- When the link does not match the target site it’s linking to, the link is demoted on the calculations. This means that Google is looking for relevance on both sides of a link.
- SERP Demotion
- A demotion signal based on SERP factors suggests potential user dissatisfaction with a page, likely measured by the number of clicks it receives.
- Nav Demotion
- Presumably, this is a demotion applied to pages exhibiting poor navigation practices or user experience issues.
Read more about demotions in the Mike King article.
Drive More Clicks Using a Broader Set of Queries
- To maintain and improve your rankings, focus on generating more successful clicks from a wider range of queries and increasing link diversity.
- This approach aligns with the idea that strong content naturally attracts varied traffic and links.
- Enhancing user experience and driving qualified traffic will signal Google that your page deserves a higher rank.