Introduction

When a page is visited for the first time, its URL is mapped with corresponding content. When the page is revisited, this content is updated (if it has been modified). If the page is marked as NOINDEX, it is ignored for indexing.

This means that Google may know about pages on the Internet (through stored URLs) without having indexed their content. These pages do accumulate PageRank.

SEO Mistakes & Best Practices

- A redirected URL in a sitemap.xml is not an issue.
- A URL pointing to a NOINDEX page in a sitemap.xml is not an issue.
- A 410 error code will deindex a page faster than a 404.
- A page marked as NOINDEX must be recrawled before the directive is taken into account.
- If say, 100 000 new pages are posted on a website, it make take a long time to index them, this is not a capacity issue, Google wants to know whether they are worth being served to users.
- A page's content can be updated in the index while its archived copy is not.
- There may be some delay between the indexing of a page and taking its canonical link into account, temporary duplicate entries in search results are possible.
- If you can copy and paste content from a displayed HTML page, then it can be parsed for indexing too.
- In general, Ajax responses are not indexed.
- To process a canonical link, the source and target pages must be indexed first.
- A canonical link directive can be ignored by Google when it does not trust it or when it detects some obvious errors.
- DMCA does not remove pages from the index, but from the search results only.
- Putting a page behind a login is a proper way to prevent its indexing.
- Using the hreflang directive improves proper indexing.
- Using the site: command is a proper way to check whether a page is indexed.
- A page not indexed might be a signal of low quality, if no technical issue explains it.
- Html is processed faster than Javascript generated content.

REM: Bing says it can remove whole websites from their index to make space for better ones.