Meriin Labs · original research · 14 June 2026 · every number below is computed by published code. See Reproduce this.
The claim: Among the leaked Google attributes that SEOs actually act on, positive ranking levers are a small minority. Most leaked signals exist to suppress, quality-gate, store/select, or comprehend, not to boost. What would prove it wrong: a function-classification of the same corpus in which “promotion” signals are a plurality, or in which the non-promotion share falls below ~60%.
When the Google Content Warehouse documents leaked in May 2024, the SEO industry read them as a list of ranking factors, things you add to go up. That framing quietly assumes most signals are levers you pull to win. We tested that assumption by classifying what each leaked signal actually does.
The answer: of the 119 distinct leaked attributes that surface as actionable checks in a full technical audit, only 16 (13.4%) are positive ranking levers. The other 103 (86.6%) exist to demote, gate quality, manage storage and selection, read engagement, comprehend content, measure page experience, or grant SERP features. “Ranking factors” is the wrong mental model. The leak is mostly a machine for not losing, not for winning.
Hypothesis
Pre-registered before classification. We expected the leak to be dominated by demotion and infrastructure signals rather than positive levers, because Google’s public guidance for a decade has emphasized avoiding problems (thin content, spam, bad UX) over adding ranking tricks. Falsifier: if “promotion”-function signals turned out to be the largest single category, or if non-promotion signals were under ~60% of the total, the hypothesis fails.
Dataset
- Source: the Meriin
technical-seo-auditcorpus (generate_audit_sheet.py), a 402-check, 21-category technical SEO audit whosegoogle_signalcolumn annotates each check with the leaked Content Warehouse attribute(s) it maps to. - Unit of analysis: the distinct leaked attribute (e.g.
navDemotion,siteAuthority,goldmineBodyFactor). 119 distinct attributes are referenced across 240 of the 402 checks; the remaining 162 checks reference no specific leaked attribute (they cover best-practice items the leak doesn’t name). - Collected: 2026-06-14 · Scope: global / English · Engine state: post-2024-leak public knowledge.
- Important scoping: this is not the full leak. The raw leak exposed thousands of attributes across Google’s document and ranking systems; this study classifies the 119 that show up as actionable audit signals, i.e. the slice an SEO would actually inspect and act on. The finding is about that actionable slice, and we say so wherever it matters.
- Provenance: Public / self-collected data only. The attribute names come from the public 2024 Google API leak; the audit corpus is Meriin’s own. No client data appears in the dataset, ledger, or appendix.
Method
- Extract. Parse
generate_audit_sheet.pywith Python’sast(no code execution), pulling every 8-field audit row and itsgoogle_signalcell. →data/.../raw/audit_rows.csv. - Tokenize. Split each
google_signalcell into individual attribute tokens; drop the—placeholder and plain-English non-attributes. - Classify. Assign each distinct attribute one of 8 functions via an explicit, auditable ruleset (a curated map for well-attested attributes, then keyword fallbacks):
- promotion: positive ranking lever (PageRank, Goldmine title/body factors,
onsiteProminence, review-promote) - demotion: suppression / penalty / safety filter (
*Demotion, spam, porn, scam, interstitial, gibberish) - quality_gate: holistic quality / authority / E-E-A-T assessment (
siteAuthority, NSR, YMYL,contentEffort, originality, authorship) - storage_selection: index / storage / crawl / dedup / error / tier (
scaledSelectionTierRank,indexinginfo,shingleInfo,ContentChecksum96) - engagement: click / behavioral (
GoodClicks,BadClicks,lastLongestClicks,chromeInTotal) - understanding: entity / embedding / semantic / date / OCR comprehension (
webrefEntities,site2vecEmbeddingEncoded) - technical: page-experience metrics (CWV:
lcp/cls/inp, TTFB, mobile, http2, SSL) - serp_feature: feature eligibility (
richsnippet, shopping, snippet generation, image license)
- promotion: positive ranking lever (PageRank, Goldmine title/body factors,
- Tier the labels. Every classification carries a function confidence: Confirmed (the function is definitional or documented in the public leak / DOJ analysis) or Inferred (deduced from the name and audit context). 74 of 119 are Confirmed; 45 are Inferred.
The full ruleset, raw rows, per-signal classifications, and figures ledger live in data/google-leak-signals-classified/, re-runnable with one command.
Results
The function split, by distinct signal:
| Function | Distinct signals | Share | Example attributes |
|---|---|---|---|
| demotion | 29 | 24.4% | pandaDemotion, navDemotion, serpDemotion, clutterScore, vlq, anchorMismatchDemotion, spambrainData, GibberishScore |
| quality_gate | 26 | 21.8% | siteAuthority, CompressedQualitySignals, contentEffort, predictedDefaultNsr, ymylHealthScore, OriginalContentScore |
| promotion | 16 | 13.4% | goldmineBodyFactor, goldmineHeaderIsH1, onsiteProminence, homepagePagerankNs, PageRankPerDocData, numOffdomainAnchors |
| storage_selection | 14 | 11.8% | indexinginfo, scaledSelectionTierRank, ContentChecksum96, forwardingdup, robotsinfolist, isErrorPage |
| understanding | 14 | 11.8% | webrefEntities, site2vecEmbeddingEncoded, EntityAnnotations, lastSignificantUpdate, bylineDate |
| technical | 10 | 8.4% | mobileCwv, lcp, cls, inp, time-to-first-byte-per-doc, isSmartphoneOptimized |
| engagement | 6 | 5.0% | GoodClicks, BadClicks, lastLongestClicks, chromeInTotal |
| serp_feature | 4 | 3.4% | richsnippet, shoppingProductInformation, SnippetBrain, imageLicenseInfo |
Percentages are share of 119 distinct signals. 0 signals were left unclassified.

The headline finding:
| Grouping | Signals | Share |
|---|---|---|
| Positive ranking levers (promotion) | 16 | 13.4% |
| Everything else (demote / gate / store / read / comprehend / measure / feature) | 103 | 86.6% |
By audit-check occurrence (how often each function is touched across the 402 checks, since one signal can recur), the same shape holds and sharpens on the infrastructure side: demotion appears in 76 checks, storage/selection in 70, quality-gate in 49, and promotion in only 36.
Confidence
- The counts (Tier-A). 402 checks, 21 categories, 119 distinct signals, and the 240/162 split are a census of the corpus, computed by script, not a sample. They are exact and reproducible.
- The function split (Tier-B). The 8-way classification is Meriin’s labeling. 74/119 labels are Confirmed (function definitional or leak/DOJ-documented, e.g. anything literally named
*Demotion); 45 are Inferred. The headline (~13% vs ~87%) is robust to relabeling: even if every Inferred label were wrong and maximally favorable to the “promotion” side, promotion could not plausibly exceed a third of the set. - Not impact-weighted (stated, not hidden). Each distinct signal counts once. A single gate like
siteAuthoritymay matter far more than ten obscure attributes. This study measures how many signals do each job, not how much each one moves rankings.
Limitations
- Actionable slice, not the whole leak. 119 attributes is the audit-relevant subset, not Google’s full attribute universe. The claim is scoped to “signals SEOs act on,” not “all of Google.”
- Function labels are interpretive. Where the leak doesn’t define a behavior, we inferred it from the attribute name and its audit context, and tiered it Inferred. A full cross-reference of each attribute to its raw leak definition is the v2 hardening step.
- No causation. This is a taxonomy of what signals are for, not evidence that any one of them changes rankings by a given amount.
- Corpus-dependent. Counts reflect the attributes Meriin chose to surface as audit checks. A different audit would reference a different (overlapping) slice; the script makes any such corpus re-classifiable.
How this compares to prior work
The leak was broken by Rand Fishkin (SparkToro) and given its canonical technical reading by Mike King (iPullRank), both in May 2024; NavBoost and click signals were corroborated under oath in the US v. Google antitrust trial. Most of that coverage, and the hundreds of posts that followed, enumerated interesting attributes (siteAuthority exists! Chrome data! author signals!) and read them as a ranking-factor checklist.
This study doesn’t dispute any individual attribute; it adds the missing denominator. When you classify the whole actionable set by function, the “ranking-factor checklist” framing inverts: the modal leaked signal is a demotion (24.4%) or a quality gate (21.8%), not a lever (13.4%). That reframes day-to-day SEO from “what do I add to rank” toward “what am I being penalized, gated, or excluded for”. It is why our own technical SEO audits lead with demotion and indexation exposure before touching positive levers, and it sits closer to what Google’s own guidance has said for years. It also sharpens the GEO/AI-search corollary: comprehension signals (entities, embeddings) are now a larger share of the machine than positive levers.
We deliberately do not restate any prior author’s headline numbers; this is an independent classification with its own published ruleset.
Reproduce this
Everything needed to replicate or challenge the classification lives in data/google-leak-signals-classified/:
extract_classify.py: the full extractor + classifier (curated map + keyword ruleset).raw/audit_rows.csv: all 402 extracted checks (immutable evidence).signals_classified.csv: every distinct signal with its function, confidence, rule, and occurrence count.FIGURES-LEDGER.csv: every number in this article, traced to its source.
Run python3 extract_classify.py. Disagree with a label? Edit the CURATED map and re-run, and the figures update themselves. That is the point: the classification is an argument you can audit line by line, not a number you have to trust.
The number
Of the 119 distinct leaked Google signals that surface as actionable audit checks, only 16 (13.4%) are positive ranking levers. The other 103 (86.6%) exist to suppress, quality-gate, store and select, read engagement, comprehend content, measure page experience, or grant SERP features. “Ranking factors” is the wrong frame for the leak.
Changelog & validity
- Valid as of 2026-06-14. Reflects the public post-2024-leak understanding and the current
technical-seo-auditcorpus (402 checks). - Will be re-run if the corpus changes or as raw-leak definitions are cross-referenced (will move Inferred labels toward Confirmed).
- v1.0, 14 June 2026: published under the Meriin Labs team byline. Em-dash + humanize pass applied; 50-item QA (packaging fixes: title tag, meta length, byline, in-body figure, internal links); figures-ledger trace and no-client-data grep both clean. Optional later: a named author and a GEO self-citability score.
Want this run on your stack, your signals, your demotion exposure? → Book a growth audit
Related: Technical SEO audits · How we measure results · More from Meriin Labs
Back to the Lab