What Google's Leaked Signals Actually Do: A Function-Classification of 119 Content-Warehouse Attributes

Domain

SEO Mechanics

Method

Leak analysis

Published

June 2026

13.4%

Positive ranking levers

402

Audit checks classified

119

Distinct leaked signals

What Google's Leaked Signals Actually Do: A Function-Classification of 119 Content-Warehouse Attributes

Meriin Labs · original research · 14 June 2026 · every number below is computed by published code. See Reproduce this.

The claim: Among the leaked Google attributes that SEOs actually act on, positive ranking levers are a small minority. Most leaked signals exist to suppress, quality-gate, store/select, or comprehend, not to boost. What would prove it wrong: a function-classification of the same corpus in which “promotion” signals are a plurality, or in which the non-promotion share falls below ~60%.

When the Google Content Warehouse documents leaked in May 2024, the SEO industry read them as a list of ranking factors, things you add to go up. That framing quietly assumes most signals are levers you pull to win. We tested that assumption by classifying what each leaked signal actually does.

The answer: of the 119 distinct leaked attributes that surface as actionable checks in a full technical audit, only 16 (13.4%) are positive ranking levers. The other 103 (86.6%) exist to demote, gate quality, manage storage and selection, read engagement, comprehend content, measure page experience, or grant SERP features. “Ranking factors” is the wrong mental model. The leak is mostly a machine for not losing, not for winning.

Hypothesis

Pre-registered before classification. We expected the leak to be dominated by demotion and infrastructure signals rather than positive levers, because Google’s public guidance for a decade has emphasized avoiding problems (thin content, spam, bad UX) over adding ranking tricks. Falsifier: if “promotion”-function signals turned out to be the largest single category, or if non-promotion signals were under ~60% of the total, the hypothesis fails.

Dataset

  • Source: the Meriin technical-seo-audit corpus (generate_audit_sheet.py), a 402-check, 21-category technical SEO audit whose google_signal column annotates each check with the leaked Content Warehouse attribute(s) it maps to.
  • Unit of analysis: the distinct leaked attribute (e.g. navDemotion, siteAuthority, goldmineBodyFactor). 119 distinct attributes are referenced across 240 of the 402 checks; the remaining 162 checks reference no specific leaked attribute (they cover best-practice items the leak doesn’t name).
  • Collected: 2026-06-14 · Scope: global / English · Engine state: post-2024-leak public knowledge.
  • Important scoping: this is not the full leak. The raw leak exposed thousands of attributes across Google’s document and ranking systems; this study classifies the 119 that show up as actionable audit signals, i.e. the slice an SEO would actually inspect and act on. The finding is about that actionable slice, and we say so wherever it matters.
  • Provenance: Public / self-collected data only. The attribute names come from the public 2024 Google API leak; the audit corpus is Meriin’s own. No client data appears in the dataset, ledger, or appendix.

Method

  1. Extract. Parse generate_audit_sheet.py with Python’s ast (no code execution), pulling every 8-field audit row and its google_signal cell. → data/.../raw/audit_rows.csv.
  2. Tokenize. Split each google_signal cell into individual attribute tokens; drop the placeholder and plain-English non-attributes.
  3. Classify. Assign each distinct attribute one of 8 functions via an explicit, auditable ruleset (a curated map for well-attested attributes, then keyword fallbacks):
    • promotion: positive ranking lever (PageRank, Goldmine title/body factors, onsiteProminence, review-promote)
    • demotion: suppression / penalty / safety filter (*Demotion, spam, porn, scam, interstitial, gibberish)
    • quality_gate: holistic quality / authority / E-E-A-T assessment (siteAuthority, NSR, YMYL, contentEffort, originality, authorship)
    • storage_selection: index / storage / crawl / dedup / error / tier (scaledSelectionTierRank, indexinginfo, shingleInfo, ContentChecksum96)
    • engagement: click / behavioral (GoodClicks, BadClicks, lastLongestClicks, chromeInTotal)
    • understanding: entity / embedding / semantic / date / OCR comprehension (webrefEntities, site2vecEmbeddingEncoded)
    • technical: page-experience metrics (CWV: lcp/cls/inp, TTFB, mobile, http2, SSL)
    • serp_feature: feature eligibility (richsnippet, shopping, snippet generation, image license)
  4. Tier the labels. Every classification carries a function confidence: Confirmed (the function is definitional or documented in the public leak / DOJ analysis) or Inferred (deduced from the name and audit context). 74 of 119 are Confirmed; 45 are Inferred.

The full ruleset, raw rows, per-signal classifications, and figures ledger live in data/google-leak-signals-classified/, re-runnable with one command.

Results

The function split, by distinct signal:

FunctionDistinct signalsShareExample attributes
demotion2924.4%pandaDemotion, navDemotion, serpDemotion, clutterScore, vlq, anchorMismatchDemotion, spambrainData, GibberishScore
quality_gate2621.8%siteAuthority, CompressedQualitySignals, contentEffort, predictedDefaultNsr, ymylHealthScore, OriginalContentScore
promotion1613.4%goldmineBodyFactor, goldmineHeaderIsH1, onsiteProminence, homepagePagerankNs, PageRankPerDocData, numOffdomainAnchors
storage_selection1411.8%indexinginfo, scaledSelectionTierRank, ContentChecksum96, forwardingdup, robotsinfolist, isErrorPage
understanding1411.8%webrefEntities, site2vecEmbeddingEncoded, EntityAnnotations, lastSignificantUpdate, bylineDate
technical108.4%mobileCwv, lcp, cls, inp, time-to-first-byte-per-doc, isSmartphoneOptimized
engagement65.0%GoodClicks, BadClicks, lastLongestClicks, chromeInTotal
serp_feature43.4%richsnippet, shoppingProductInformation, SnippetBrain, imageLicenseInfo

Percentages are share of 119 distinct signals. 0 signals were left unclassified.

Function split of the 119 leaked Google signals: demotion 29, quality-gate 26, promotion 16, storage/selection 14, understanding 14, technical 10, engagement 6, SERP-feature 4. Positive ranking levers are the third-largest group at 13.4%.

The headline finding:

GroupingSignalsShare
Positive ranking levers (promotion)1613.4%
Everything else (demote / gate / store / read / comprehend / measure / feature)10386.6%

By audit-check occurrence (how often each function is touched across the 402 checks, since one signal can recur), the same shape holds and sharpens on the infrastructure side: demotion appears in 76 checks, storage/selection in 70, quality-gate in 49, and promotion in only 36.

Confidence

  • The counts (Tier-A). 402 checks, 21 categories, 119 distinct signals, and the 240/162 split are a census of the corpus, computed by script, not a sample. They are exact and reproducible.
  • The function split (Tier-B). The 8-way classification is Meriin’s labeling. 74/119 labels are Confirmed (function definitional or leak/DOJ-documented, e.g. anything literally named *Demotion); 45 are Inferred. The headline (~13% vs ~87%) is robust to relabeling: even if every Inferred label were wrong and maximally favorable to the “promotion” side, promotion could not plausibly exceed a third of the set.
  • Not impact-weighted (stated, not hidden). Each distinct signal counts once. A single gate like siteAuthority may matter far more than ten obscure attributes. This study measures how many signals do each job, not how much each one moves rankings.

Limitations

  • Actionable slice, not the whole leak. 119 attributes is the audit-relevant subset, not Google’s full attribute universe. The claim is scoped to “signals SEOs act on,” not “all of Google.”
  • Function labels are interpretive. Where the leak doesn’t define a behavior, we inferred it from the attribute name and its audit context, and tiered it Inferred. A full cross-reference of each attribute to its raw leak definition is the v2 hardening step.
  • No causation. This is a taxonomy of what signals are for, not evidence that any one of them changes rankings by a given amount.
  • Corpus-dependent. Counts reflect the attributes Meriin chose to surface as audit checks. A different audit would reference a different (overlapping) slice; the script makes any such corpus re-classifiable.

How this compares to prior work

The leak was broken by Rand Fishkin (SparkToro) and given its canonical technical reading by Mike King (iPullRank), both in May 2024; NavBoost and click signals were corroborated under oath in the US v. Google antitrust trial. Most of that coverage, and the hundreds of posts that followed, enumerated interesting attributes (siteAuthority exists! Chrome data! author signals!) and read them as a ranking-factor checklist.

This study doesn’t dispute any individual attribute; it adds the missing denominator. When you classify the whole actionable set by function, the “ranking-factor checklist” framing inverts: the modal leaked signal is a demotion (24.4%) or a quality gate (21.8%), not a lever (13.4%). That reframes day-to-day SEO from “what do I add to rank” toward “what am I being penalized, gated, or excluded for”. It is why our own technical SEO audits lead with demotion and indexation exposure before touching positive levers, and it sits closer to what Google’s own guidance has said for years. It also sharpens the GEO/AI-search corollary: comprehension signals (entities, embeddings) are now a larger share of the machine than positive levers.

We deliberately do not restate any prior author’s headline numbers; this is an independent classification with its own published ruleset.

Reproduce this

Everything needed to replicate or challenge the classification lives in data/google-leak-signals-classified/:

  • extract_classify.py: the full extractor + classifier (curated map + keyword ruleset).
  • raw/audit_rows.csv: all 402 extracted checks (immutable evidence).
  • signals_classified.csv: every distinct signal with its function, confidence, rule, and occurrence count.
  • FIGURES-LEDGER.csv: every number in this article, traced to its source.

Run python3 extract_classify.py. Disagree with a label? Edit the CURATED map and re-run, and the figures update themselves. That is the point: the classification is an argument you can audit line by line, not a number you have to trust.

The number

Of the 119 distinct leaked Google signals that surface as actionable audit checks, only 16 (13.4%) are positive ranking levers. The other 103 (86.6%) exist to suppress, quality-gate, store and select, read engagement, comprehend content, measure page experience, or grant SERP features. “Ranking factors” is the wrong frame for the leak.

Changelog & validity

  • Valid as of 2026-06-14. Reflects the public post-2024-leak understanding and the current technical-seo-audit corpus (402 checks).
  • Will be re-run if the corpus changes or as raw-leak definitions are cross-referenced (will move Inferred labels toward Confirmed).
  • v1.0, 14 June 2026: published under the Meriin Labs team byline. Em-dash + humanize pass applied; 50-item QA (packaging fixes: title tag, meta length, byline, in-body figure, internal links); figures-ledger trace and no-client-data grep both clean. Optional later: a named author and a GEO self-citability score.

Want this run on your stack, your signals, your demotion exposure?Book a growth audit

Related: Technical SEO audits · How we measure results · More from Meriin Labs

Back to the Lab