Source: http://www.free-seo-news.com/newsletter697.htm#facts
You always want search engines (Google) to generally index your webpages. Which is why it is important to identify potential issues on your website for you to be able to fix them before it’s too late.
1. Errors in the robots.txt file
– Having errors in your website’s robots.txt file will keep Google away. Double-check the file to make sure that you don’t exclude directories that you want to see in the Google SERP. However, keep in mind that your site visitors can still see the pages you exclude in the robots.txt file.
2. Meta robots noindex tag
– This allows you to tell the Google bot that a particular page on your site should not be indexed. To exclude a page on your site from the search results, add the code below in the <head> section of the page:
<meta name=”robots” content=”noindex, nofollow”>
On the other hand, if you want the Google bot to follow the links on a particular page on your website, use the following tag:
<meta name=”robots” content=”noindex, follow”>
With this, the page won’t appear in the Google SERP but the links would still be followed. Remove the tag if you want to make sure that Google indexes all of your pages.
3. Wrong HTTP status code
– The HTTP status code allows you to send site visitors as well as search engine bots to different places on your website. Typically, a web page has a “200 OK” status code. For example, you can use the following server status codes:
- 301 moved permanently – request and all future requests should be sent to a new URL.
- 403 forbidden – server refuses to respond to the request.
A 301 redirect should be used for SEO purposes if you want to ensure those who visit the old pages of your site get directed to its new pages.
4. Pages are password protected
– Only site visitors who know the password to your website will be able to view your content if you have password protect on your pages. This would also mean that search engine bots will not also be able to access your site’s pages. This can have an adverse effect on your site’s user experience so you should definitely go through this thoroughly.
5. Cookies or JavaScript
– These can also keep the Google bot away. For instance, by making content only accessible to user agents that accept cookies, you will be able to hide content. Also, there could be an instance where your webpages use very complex JavaScripts to execute your content, while most search engine bots out there do not. Which means they will not be able to read your site’s pages.