I've scanned somewhere over 500 sites with Seoxpert at this point — small business homepages, agency portfolios, e-commerce checkouts, a few B2B SaaS dashboards. The pattern I see most: an absolutely fine-looking site in a browser that's silently excluded from Google indexing because of one of these 12 errors. Not low-quality content, not weak backlinks — just a forgotten noindexfrom staging, a canonical pointing at the wrong origin, or a sitemap entry blocked by robots.txt. The thing about these errors is they don't produce error pages or browser warnings. The site looks correct, traffic just slowly drops.
I ranked these by how often I find them and how much damage they cause when present. The top three (noindex on content, noindex+canonical conflict, canonical pointing at non-indexable targets) account for roughly 60% of severe indexing problems I see. Each issue below links to a longer fix recipe with code examples, and you can run a free scan to check if any apply to your site.
Looking for the basics? See the 10 most common SEO issues first. The page below covers advanced indexability failures.
Check if your site has these issues — free, no install required.
A noindex directive on a content page that should rank is the most directly damaging technical SEO error. The page is invisible to organic search regardless of its quality, backlinks, or internal link equity. It will never appear in Google results until the directive is removed and the page is re-crawled.
Noindex directives are frequently left in place from staging environments, added by CMS bulk operations, or introduced through theme updates that modify the HTML head template. Every noindex page should be audited — some will be intentional (search filters, thank-you pages), but content pages with noindex are always a problem.
Why this hurts: every noindex on a content page eliminates all organic search traffic potential for that URL — permanently, until fixed.
How to detect it: the scanner reads <meta name="robots"> and X-Robots-Tag headers on every page.
Using noindex and a cross-page canonical simultaneously creates contradictory signals. The noindex says "do not index this page." The canonical says "the preferred URL for this content is [other URL]." Google's response to this conflict is unpredictable — it may follow the canonical and index the target, follow noindex and deindex both, or treat the page as ambiguous.
The fix is to choose one signal. If the page should consolidate to the canonical target: remove noindex and let the canonical do the work. If the page should be excluded from the index entirely: remove the canonical (it has no purpose on a noindexed page). Pagination pages are the most common place where both signals appear together in error.
Why this hurts: conflicting signals can delay link equity consolidation and produce unpredictable indexing outcomes for both the source page and the canonical target.
How to detect it: the scanner identifies pages with both a noindex directive and a canonical pointing to a different URL.
View full fix guide →A canonical tagthat points to a noindex page, a 404, or a redirect is counterproductive. The source page directs all ranking signals toward a URL that cannot be indexed — leaving neither page in Google's index. This pattern is common when a canonical is set up before the target page exists, or when the target page is later noindexed without updating all incoming canonicals.
Fixing this requires auditing canonical targets for their indexability status. Every canonical href should resolve to a 200-status, indexable page. When a target page is noindexed or deleted, all pages canonicalising to it must be updated to point to a valid replacement.
Why this hurts: canonicals to non-indexable targets can effectively remove both the source page and the target from the index simultaneously.
How to detect it: the scanner resolves canonical targets and checks their status codes and robots directives.
View full fix guide →An unintentional cross-canonical occurs when a page declares a canonical pointing to a different URL than its own — not as a deduplication strategy, but as a misconfiguration. The most common cause is a CMS copy-paste error: a new page is created from a template that retains the original page's canonical href rather than updating it to the new URL.
The result is a page that directs all its ranking signals to a different URL, effectively suppressing itself from the index. The fix is to ensure every page's canonical href matches its own URL — either by dynamic generation or by auditing manually created pages. See the canonical tag guide for implementation patterns.
Why this hurts: unintentional cross-canonicals can invisibly prevent important pages from appearing in search results at their own URL.
How to detect it: the scanner resolves canonical tag hrefs and compares each to the page's own URL.
View full fix guide →Pages can receive indexing instructions from two sources: the HTML <meta name="robots"> tag and the HTTP X-Robots-Tagresponse header. When these two sources contradict each other — for example, the meta tag says "index" while the header says "noindex" — search engine behaviour is unpredictable. Google typically honours the most restrictive signal.
This conflict typically arises when a server or CDN layer injects X-Robots-Tag headers independently of the CMS that generates the meta robots tags. The fix is to align the two mechanisms or eliminate one entirely — ideally managing all robot directives at the CMS level and leaving the server layer neutral.
Why this hurts: conflicting directives create an uncontrolled indexability state — pages may appear or disappear from the index unpredictably.
How to detect it: the scanner compares noindex/nofollow states from both meta robots and X-Robots-Tag headers.
View full fix guide →Pages within two clicks of the homepage are typically the most important in the site hierarchy — main category pages, key landing pages, and primary conversion paths. When these pages return 404 or other 4xx errors, they waste crawl budget and may block Googlebot from discovering the deeper pages they link to.
Shallow 4xx pages are most common after deletions without redirects: a product is removed, a category is renamed, or an internal link is not updated. Every non-200 response at depth ≤ 2 requires either a 301 redirect to the most relevant replacement page or a rebuild of the deleted content.
Why this hurts: errors on shallow pages waste crawl budget and may block indexing of entire page hierarchies that depend on those pages as crawl entry points.
How to detect it: the scanner records crawl depth of each page and flags those at depth ≤ 2 that return non-200 status codes.
View full fix guide →Redirect chains accumulate across site migrations. HTTP to HTTPS, non-www to www, old domain to new domain — each migration adds a hop, and old redirects are rarely cleaned up. A three-hop redirect chain (HTTP → HTTPS → non-www → www) adds 200–400ms of latency and partially dilutes link equity at each hop.
Google can follow redirect chains, but will not pass full PageRank through them. The fix is to collapse every chain to a single 301 hop — update the redirect rule at the origin to point directly to the final destination. Simultaneously update all internal links pointing to intermediate redirects to point to the final URL directly.
Why this hurts: each redirect hop reduces link equity passed through the chain and adds measurable latency that affects user experience and crawl efficiency.
How to detect it: the scanner tracks redirect chains across all crawled URLs and flags chains with more than one intermediate hop.
View full fix guide →JSON-LD blocks with syntax errors — unclosed quotes, trailing commas, unescaped characters — cannot be parsed by search engines. An invalid schema block is equivalent to having no schema at all: the rich-result eligibility for that page is eliminated entirely. The error produces no visible warning in the browser, making it easy to miss.
The most common causes are template interpolation that injects special characters without JSON encoding (titles with double quotes or apostrophes), and manual JSON-LD that was edited without proper syntax checking. Use the Google Rich Results Test to validate structured data and fix all syntax errors before publishing.
Why this hurts: invalid schema eliminates rich-result eligibility — star ratings, FAQs, HowTo steps, and breadcrumb enhancements — which typically achieve 20–30% higher CTR.
How to detect it: the scanner parses all <script type="application/ld+json"> blocks. Any block that throws a parse error is flagged.
Hreflang annotations tell Google which page to serve to users in which language and region. When the locale values are malformed — en_US instead of en-us, or a typo in the country code — Google ignores the entire hreflang set. The result is that international users are served the wrong language version of the site.
Valid BCP 47 locale codes use hyphens, not underscores: en, en-us, de-at, pt-br. The special value x-default marks the fallback URL for users not covered by other language tags. Audit your hreflang implementation whenever adding new regional pages.
Why this hurts: invalid hreflang causes the entire annotation set to be discarded, sending international users to the wrong regional version of your site.
How to detect it: the scanner validates each hreflang value against the BCP 47 locale format specification.
View full fix guide →Google requires every page in a hreflang set to include a self-referencing annotation — a hreflang link pointing to the page's own URL. Without it, the hreflang set is considered incomplete and may be ignored. The entire benefit of the annotation — ensuring the correct regional page appears for each audience — is lost.
The most common cause is a template that generates hreflang alternates for other language versions but omits the current page's own entry. The fix is straightforward: add the current URL as a hreflang entry with the current page's locale code as the value.
Why this hurts: incomplete hreflang sets cause international users to be served the wrong language version of pages, reducing engagement and conversions in non-primary markets.
How to detect it: the scanner checks that at least one hreflang link on each page points back to the page's own URL.
View full fix guide →Google displays approximately 50–60 characters of a title tag in search results before truncating with an ellipsis. Titles over 70 characters will almost certainly be cut off, hiding the brand name, primary keyword, or call to action from the search snippet. This is a common consequence of using the full page name as the title without length consideration.
The fix requires revising the title to lead with the most important keyword within the first 50–60 characters. A common pattern is "[Primary Keyword] — [Brand Name]" — this ensures the keyword appears in full even if the brand name is truncated. For product pages with long names, abbreviate the product variant in the title rather than including the full specification string.
Why this hurts: truncated titles reduce click-through rates by hiding important keywords and CTAs from the SERP snippet.
How to detect it: the scanner reads <title> text. Titles longer than 70 characters are flagged.
A missing self-referencing canonical leaves the page vulnerable to canonical drift — the risk that search engines select a different URL variant (with query parameters, trailing slashes, or protocol differences) as the preferred version. Once Google chooses a different canonical, link equity consolidation and index entry may shift to the unintended URL.
Adding a self-referencing canonical is a defensive measure: it explicitly declares the preferred URL before Google makes its own selection. Every indexable page should carry one. This is the lowest-effort canonical fix — a single template change adds it sitewide. See the full canonical tag guide for implementation details.
Why this hurts: without a self-referencing canonical, URL variants from query strings, session IDs, or CDN transforms can silently become the indexed version.
How to detect it: the scanner reads <link rel="canonical"> from each page head. Pages without this tag are flagged.
| # | Issue | Severity | Fix Effort |
|---|---|---|---|
| 01 | Noindex Directive on Content Pages | High | Low — remove directive |
| 02 | Noindex + Canonical Conflict | High | Low — pick one signal |
| 03 | Canonical to Non-Indexable Target | High | Medium — fix canonical targets |
| 04 | Unintentional Cross-Canonical | Medium | Low — update canonical href |
| 05 | Conflicting Robots Directives | Medium | Low — align meta robots + headers |
| 06 | Shallow 4xx Pages | High | Medium — fix or redirect |
| 07 | Redirect Chains (Multi-hop) | Medium | Medium — collapse to one hop |
| 08 | Invalid JSON-LD Schema | Medium | Low — fix syntax errors |
| 09 | Invalid Hreflang Locale Values | Medium | Low — fix BCP 47 codes |
| 10 | Hreflang Missing Self-Reference | Medium | Low — add self entry |
| 11 | Title Tags Over 70 Characters | Low | Medium — rewrite titles |
| 12 | Missing Self-Referencing Canonical | Low | Low — template change |
Browse the full technical SEO issues library. For basic SEO issues (titles, descriptions, schema), see the 10 most common SEO issues. Or check everything at once with the free website SEO audit tool.
The Technical SEO scanner detects all 12 of these mistakes in a single crawl. Enter your URL to get a complete technical audit with severity labels and fix recommendations in under 2 minutes.
Or sign up to use your free scan credit. View plans for ongoing monitoring.
Noindex on content pages (#1) and canonical tags pointing to non-indexable targets (#3) are the most damaging — both can silently remove pages from Google's index without any visible error in the browser.
Run a free Seoxpert scan. The scanner checks all 12 in a single pass, returning severity labels and source page URLs for each finding.
Yes. The most dangerous combination is noindex + cross-canonical (#2): the page is both excluded from the index and directing its signals to a different URL. Another damaging pattern is canonical to noindex target (#3), where both pages become effectively invisible.
See also: What outdated SEO is costing your business.