Source: http://www.seo-theory.com/2015/07/11/why-some-internal-pages-are-not-getting-indexed-or-cached/
Internal pages that are not getting indexed or cached is one of the most common problems of website owners. To help you understand more about indexing problems, here’s a list of possible causes.
- Typo in domain name or URL
- In case you use browser history to check on indexing for individual pages, be sure that you are clicking on an entry without a typo
- Blocked by robots.txt
- Make sure there are no typos, misplaced user-agent names, omitted user-agent names and intentional “Disallow” directives on your “robots.txt” file.
- Directories are password protected
- Crawlers won’t be able to get past protected directories so either remove the password or move the content to a crawlable directory.
- Canonicalization which suggest search engines to treat all pages with this tag as same content
- User-agent blocking
- You may be denying acces to crawlers via user-agent in “.htaccess” or “IIS Config”
- Server timeouts
- Unresponsive servers can also impede crawl which is a common issue.
- Ghostly redirects
- There may be an HTML redirect implemented on the unindexed page. This is usually apllied via “.htaccess,” “IIS Condig” or meta directive.
- Blocking via “nofollow” directives
- There are many SEO plugins that add “nofollow” attribute to internal links that the plugin thinks are not important so you either disable these attributes or remove the plugins.
- Poor discovery pathways or broken navigation
- This usually happens on large websites. It’s very hard to ensure that every good URL has sufficient internal linkage pointing to it. For pages that cannot be reached from multiple points for any website, crawlers may assign very low priorities to them.
- Few external links
- Search engines only index fractions of the URLs they discover and choose what to index depending on several criteria.
- Page-level penalty or downgrade
- For web pages that has been hacked or is using keyword stuffing and spammy links, search engines may see them as junk thus assigning these pages low crawling or indexing priority
- HTML coding error
- Page hijacking
- Crawler sees an error
- This may be a misconfigured .htaccess or broken Perl or PHP code
- Domain under penalty
It is impossible to force search engines to index every page on your site so the only thing you can do is to improve the usefulness of your content and reduce as much clutter as possible on your website to earn more recognition from others.