Seoxpert.io
Guides/Complete Technical SEO Audit
Pillar guide

The Complete Technical SEO Audit: A 2026 Checklist

I've run this audit on roughly 500 sites — small business homepages, agency portfolios, e-commerce checkouts, B2B SaaS dashboards, and whatever else came through the Seoxpert scanner. The order below is the order I run things in real audits, not alphabetical or grouped by "importance to Google" in the abstract. Every step fixes things that downstream steps depend on.

The most common audit mistake I see — including in my own first attempts back in 2018 — is starting at Core Web Vitals because Lighthouse gave a 67. Optimizing the LCP on a page that has noindex from a forgotten staging deploy is wasted work. Fix the stack from the bottom up.

Why audit order matters

Technical SEO is a stack. The order Google does things in is: discovery (find the URL via links or sitemap) → fetch (download the HTML, respecting robots.txt) → render (run the JavaScript) → index (decide whether to keep it, and which canonical URL to attribute it to) → rank (compete for queries against everything else).

One audit I did in 2024 was a Shopify storefront whose owner had paid an SEO agency $4,000 to "optimize Core Web Vitals." The agency's report had 30 pages of Lighthouse screenshots. Their actual organic traffic was tanking because a Shopify app they had installed two months earlier was injecting noindex,nofollow into every product detail page. The fix took 90 seconds (uninstall the app). The Lighthouse work would have been useful in three months — once the indexing came back. Fix the stack from the bottom up.

1. Crawl access

Three checks here cover almost every real crawl failure I see. The pattern is: someone made a change to staging, then deployed staging-to-production by accident, and nobody noticed because the homepage still loads in a browser. Search Console will eventually surface it — sometimes after weeks of traffic damage.

robots.txt is reachable and permissive

Fetch /robots.txt on every subdomain. A 404 is fine (means no rules, full crawl allowed). A 500 or timeout is bad — Googlebot pauses crawling entirely until the file responds. The worst case: a Disallow: /from staging deployed to production. I've seen this exactly twice in my own audits — once on a Webflow site that exported its staging environment for "backup," once on a Next.js app where someone forgot to remove a developer-only middleware. Both times the site dropped out of Google over about 4 days. Use the robots.txt tester immediately after any infra deploy.

XML sitemap is valid and complete

I check three things specifically: (1) the sitemap URL returns 200 and parses as valid XML, (2) every URL in it returns 200 (not 3xx, not 4xx — Google warns on either), (3) indexable pages on your site that AREN'T in the sitemap are flagged. The third one is the silent killer. A WordPress site I audited last year had 4,200 indexable URLs but the sitemap had only the 600 oldest posts because the SEO plugin's sitemap generator had silently failed in 2022. Three years of new content was crawled late or not at all. Vercel's default Next.js sitemap also gets this wrong if you don't include dynamic routes explicitly — easy to overlook.

Internal links reach every important page

A page with zero inbound internal links is an "orphan." Google can find it via the sitemap, but it has no PageRank flowing in, so it ranks badly even when indexed. Every important page should be reachable from the homepage in ≤ 3 clicks. The most common cause: a category page that was once linked from the main nav but got moved to a footer-only or sitemap-only link during a redesign. See crawl & links issue library for the full list of crawl-discovery patterns.

Related reading: robots.txt glossary · robots.txt tester

2. Indexing signals

Once crawled, each page tells Google whether to index it and which URL to prefer. Indexing conflicts are silent — pages vanish from search without warnings in Search Console unless you look for them.

Canonical tag resolves correctly

Every indexable page needs a canonical tag pointing to the preferred URL — often itself. Five failure modes to check:

  • Self-canonicals pointing to a different URL (protocol or trailing-slash mismatch)
  • Canonicals pointing to pages that return 3xx, 4xx, or 5xx
  • Canonicals pointing to a noindex page (signals conflict, both may be ignored)
  • Multiple canonical tags on the same page — only one is honoured
  • Cross-domain canonicals that may not be intentional

noindex is only where you mean it

A template-level noindex applied to production is as catastrophic as the robots.txt scenario. The Shopify story from earlier was this exact pattern. The fix: grep your codebase for noindex, audit every occurrence, and confirm only intended pages have it (internal search results, filtered product views, login pages, thank-you pages, draft preview routes). For WordPress sites, check Yoast and RankMath settings; for headless CMS deployments, check the metadata you're pulling from the CMS API.

Redirect chains are short

The classic accumulated-redirect-chain looks like this: http://example.com/Page https://example.com/Page https://www.example.com/Page https://www.example.com/page https://www.example.com/page/. Each step is a redirect rule someone added one at a time over years. Each hop costs ~80ms of latency for users and a hop of crawl budget for Google. Two hops max is the target — three hops triggers Search Console warnings. Run the redirect chain checker on both apex and www variants of your origin to find these.

hreflang is reciprocal and valid

For multilingual or multi-regional sites, hreflang annotations must be reciprocal (if the English page points to the Danish page, the Danish page must point back to the English one) and use exact ISO 639-1 language codes plus optional ISO 3166 country codes. The most common bug I find here is the BCP 47 region tag for the UK: it's en-GB, not en-UK. If hreflang is malformed or non-reciprocal, Google silently ignores it and falls back to its own language detection — which is wrong about 30% of the time on edge cases.

Full issue catalog: technical SEO issues · most common technical SEO mistakes

3. On-page signals

Once the page is crawlable and indexable, the content needs to tell Google what it is about.

Unique, descriptive title tags

Each indexable page has a unique <title>, 55–60 characters, that describes the specific topic. Duplicate titles across many pages are a template-leak signal; missing titles force Google to synthesise one from H1 or anchor text.

One H1 per page, aligned with the title

See H1 tag. Common failure modes: H1 wrapped around the logo (same on every page), missing H1 entirely, or an H1 that contradicts the title and target keyword.

Meta descriptions under the truncation point

120–160 characters. See the meta description glossary and check individual descriptions with the length checker.

Structured data is valid and matches content

See structured data. Use JSON-LD, include all required properties for each schema type, and ensure every claim in the markup is backed by visible page content. Fabricated or hidden-content markup can trigger manual actions.

Full list: on-page SEO issues · most common SEO issues

4. Core Web Vitals and performance

See Core Web Vitals for the full primer. Three metrics matter, each with a 75th-percentile threshold from real user data (not lab tests):

  • LCP < 2.5s — the largest above-the-fold element renders quickly. The fix 90% of the time is the hero image: switch from JPEG to WebP, set explicit width/height, and addfetchpriority="high". The other 10% is render-blocking CSS in the head — split it with critical CSS or move non-critical sheets behind media queries that don't match on first paint.
  • INP < 200ms — every interaction (click, tap, key press) gets a visible response within 200ms. INP replaced FID in March 2024 and is genuinely harder to pass. Long tasks from third-party scripts (Hotjar, Intercom, FullStory) are the usual cause. Defer them to requestIdleCallback or load behind user interaction.
  • CLS < 0.1 — layout doesn't jump as content loads. Set explicit dimensions on every image, video, and ad slot. Reserve space for late-loaded embeds (YouTube, Twitter, custom widgets) with aspect-ratio or min-height. The worst CLS regressions I've found come from web fonts swapping in via font-display: swap with a metric-mismatched fallback — preload the font or use size-adjust in the fallback declaration.

One thing worth saying clearly: Google ranks on field data(CrUX), not lab data (Lighthouse). I've seen sites with Lighthouse scores in the 90s that fail CrUX because the lab uses a 4G connection on a Moto G4 simulation — real users on 3G in rural areas hit 6+ second LCPs. Open Chrome → DevTools → Performance → CrUX (or use https://pagespeed.web.dev/ with the "real users" toggle on) to see what Google actually scores.

Full issue list: performance issues · most common performance issues

5. Security

HTTPS is a minor direct ranking signal. More importantly, security issues erode trust signals: browsers warn users, search engines show security interstitials, and compromised sites can be de-indexed entirely.

  • HTTPS everywhere. Every URL redirects to HTTPS. Certificate is valid and covers apex + www + any subdomains in use.
  • HSTS header on every HTTPS response. Start with a short max-age, graduate to 1 year plus preload eligibility.
  • Content-Security-Policy that blocks inline scripts where possible and restricts external origins.
  • X-Frame-Options or frame-ancestors to prevent clickjacking.
  • No mixed content. No HTTP resources loaded on HTTPS pages.

Full list: security issues · most common security issues

6. Mobile and accessibility

Google indexes the mobile version of your site by default. Issues that only appear on mobile — cut-off content, missing viewport, touch targets too small — directly affect rankings even if the desktop experience is perfect.

  • Viewport meta tag present: <meta name="viewport" content="width=device-width, initial-scale=1">
  • Tap targets at least 48px, with sufficient spacing to avoid mis-taps
  • Text readable without zooming (minimum 16px)
  • Images have alt text for both accessibility and image search
  • Heading hierarchy is correct (H1 → H2 → H3, no level skipping)

7. Monitoring and regression detection

A technical SEO audit is a snapshot. Every deploy can reintroduce any issue you just fixed. Most regressions happen in three places:

  • Template changes — a refactor that accidentally removes canonical tags from every product page
  • Infrastructure changes — CDN config, host rename, staging robots.txt in production
  • Third-party additions — a new analytics script that pushes LCP from 1.8s to 4.2s

The fix: run audits on a schedule, not just on demand. Compare each new scan against the previous one. Seoxpert has native per-domain scheduling and regression diffing that flags newly-introduced issues as soon as they appear.

What to do next

Run a full audit now, fix the highest-severity findings first, and set up scheduled scans so regressions get caught the day they happen — not the week they affect rankings.

Related pillars for the specific scenarios this guide references:

Run the full audit on your own site — free, in under 2 minutes.