What is Site Indexing Anyway?

When asked about search engines, many people think of Bing, Yahoo, AOL, or Google but there are actually a large number in existence. Google handles a majority of Internet searches so most of what is discussed pertains to their particular practices. However, other search engines function in a similar manner. What some people might not understand is that an Internet search does not, literally, search the Internet. It actually searches the index that engine has created based on what it has found on the Internet and subsequently deemed shareworthy. But before we delve into the topic of site indexing and related concepts, there are some terms with which you should familiarize yourself.

Definitions

Algorithm: a formula or set of rules that determine what content shows up in Internet search results.

Crawl: when a spider (see below) visits a web page or site for tracking purposes.

Meta tag: bits of text placed within a page’s code that help define the contents of the page. There are four types: keywords, title, description, and robots (which indicate to bots what they should do with a particular page).

Robots.txt file: an exclusion protocol that tells bots which parts of a site should not be scanned or analyzed.

Sitemap: a model with all of the information about a website’s content that helps search engines (and users) navigate the site. It can be an XML doc that tells bots how to search a particular site, an organizational chart, or a list of pages with links that is hierarchical in nature and organized by topic.

Spiders: also called “bots”, are special programs that crawl through pages and posts on webpages to collect information to be sent back to Google. They are creating a map of the visible Internet.

Web index: a massive database with information about all of the web pages and sites that have been analyzed by a search engine.

Web search engine: software system that is designed to search for information on the Internet and present results to the user based on that search engine’s ranking algorithm.

Indexing

When we say a webpage has been indexed, we mean it has been added to the list of possible Internet search results. Google index works similarly to the way books in libraries used to be indexed in card catalogs. If you couldn’t find the card, you weren’t going to find the book. Web indexing provides those cards to allow your site and pages to be found during Internet searches.

A Google crawler bot “crawls” or scans a site for its content and then sends the information back to Google for processing. The information sent by Google bots is cataloged and added to an index of all sites on the web that have been scanned. The information is organized and then an algorithm measures relevant data about a website and is used to rank similar sites in comparison to one another. This ranking determines whether or not each individual page may be allowed to appear in relevant search results and in what order pages are presented in the search results. Google algorithms are always changing and are not shared with the public. In fact, Google uses in excess of 200 factors to rank websites. While some factors have proven to be constant influences, others are the focus of debate and conjecture.

What are the bots looking for?

Domain age is a factor. Once a site’s existence is past 6 months, its pages will rank more quickly.

Spiders are looking for fresh, high-quality content. They take into consideration things such as post length, timeliness of content, and originality of content which means they are scanning for non-duplicated content and always on the lookout for plagiarism. The frequency with which new content is added is also noted as are the general search engine criteria of correct search keywords, titles, image text and headings.

In addition to the content quality of the page and site, Google is looking at user experience. They examine page loading time, mobile responsivity, and how easy it is to navigate the site. Sharing is also important – links, web traffic, guest blogs, etc. Quality internal and external links to pages within your website definitely help in the indexing and ranking process.

How do they know where to crawl?

Bots are guided by past crawls, links on sites that lead to other pages and sites, and sitemaps submitted by site owners. It is worth noting that while a site map file can help bots productively crawl your site, it doesn’t guarantee the site will be more quickly indexed or ranked any differently.

You only want vital parts of your web site indexed so you are not penalized for pages that might be deemed “not worthy” based mainly on the fact that were never intended for public viewing. These might be pages meant for employees only or administrative pages. As you saw in the definition above, robots.txt files can keep the spiders from crawling designated parts of a site. Meta tags with a “noindex” value also keep useless information and pages from being indexed.

To see if a site has been indexed by a search engine, enter the URL of the domain with “site:” before it. (Example – site:kimskitchen.com) You’ll get back a list of all pages on the site that have been indexed and the current meta tags saved in the search engine index. If you have a non-indexed page it could be because the page has not yet been crawled, the page was crawled and found to be inappropriate or not worthy of indexing, or a robots.txt file indicated that the page should not be crawled. Few pages should stay non-indexed.

To be indexed more quickly

Site owners can, and should, submit a sitemap directly to Google and also directly submit individual pages. Direct submission asks Google to start the indexing process. Activating Google services such as Google Analytics or Google Search Console also signal that your site is ready to be crawled.

Avoid having isolated web pages unless you are doing so purposefully. Link your new pages to older, indexed pages and make changes per Google’s ranking factors that will help your pages be better received. Bot friendly pages provide content of value, the ratio of text to code favors text (as that is what bots read); they contain a navigation bar that provides links to all internal pages; and all non-text content is tagged with URLs and alt text so the bots can “see” things like images that they otherwise can’t read.

There is no guaranteed timeline for when Google will index a new page. People have found it can take anywhere from four days to six months for a site to be crawled and indexed by a search engine. The best course of action is to continue building quality content on your website and regularly submitting to Google new pages and updated site maps. Contact us today to learn how Strategy Driven Marketing experts can help you get your web pages indexed and ranked.

When asked about search engines, many people think of Bing, Yahoo, AOL, or Google but there are actually a large number in existence. Google handles a majority of Internet searches so most of what is discussed pertains to their particular practices. However, other search engines function in a similar manner. What some people might not understand is that an Internet search does not, literally, search the Internet. It actually searches the index that engine has created based on what it has found on the Internet and subsequently deemed shareworthy. But before we delve into the topic of site indexing and related concepts, there are some terms with which you should familiarize yourself.

Definitions

Algorithm: a formula or set of rules that determine what content shows up in Internet search results.

Crawl: when a spider (see below) visits a web page or site for tracking purposes.

Meta tag: bits of text placed within a page’s code that help define the contents of the page. There are four types: keywords, title, description, and robots (which indicate to bots what they should do with a particular page).

Robots.txt file: an exclusion protocol that tells bots which parts of a site should not be scanned or analyzed.

Sitemap: a model with all of the information about a website’s content that helps search engines (and users) navigate the site. It can be an XML doc that tells bots how to search a particular site, an organizational chart, or a list of pages with links that is hierarchical in nature and organized by topic.

Spiders: also called “bots”, are special programs that crawl through pages and posts on webpages to collect information to be sent back to Google. They are creating a map of the visible Internet.

Web index: a massive database with information about all of the web pages and sites that have been analyzed by a search engine.

Web search engine: software system that is designed to search for information on the Internet and present results to the user based on that search engine’s ranking algorithm.

Indexing

When we say a webpage has been indexed, we mean it has been added to the list of possible Internet search results. Google index works similarly to the way books in libraries used to be indexed in card catalogs. If you couldn’t find the card, you weren’t going to find the book. Web indexing provides those cards to allow your site and pages to be found during Internet searches.

A Google crawler bot “crawls” or scans a site for its content and then sends the information back to Google for processing. The information sent by Google bots is cataloged and added to an index of all sites on the web that have been scanned. The information is organized and then an algorithm measures relevant data about a website and is used to rank similar sites in comparison to one another. This ranking determines whether or not each individual page may be allowed to appear in relevant search results and in what order pages are presented in the search results. Google algorithms are always changing and are not shared with the public. In fact, Google uses in excess of 200 factors to rank websites. While some factors have proven to be constant influences, others are the focus of debate and conjecture.

What are the bots looking for?

Domain age is a factor. Once a site’s existence is past 6 months, its pages will rank more quickly.

Spiders are looking for fresh, high-quality content. They take into consideration things such as post length, timeliness of content, and originality of content which means they are scanning for non-duplicated content and always on the lookout for plagiarism. The frequency with which new content is added is also noted as are the general search engine criteria of correct search keywords, titles, image text and headings.

In addition to the content quality of the page and site, Google is looking at user experience. They examine page loading time, mobile responsivity, and how easy it is to navigate the site. Sharing is also important – links, web traffic, guest blogs, etc. Quality internal and external links to pages within your website definitely help in the indexing and ranking process.

How do they know where to crawl?

Bots are guided by past crawls, links on sites that lead to other pages and sites, and sitemaps submitted by site owners. It is worth noting that while a site map file can help bots productively crawl your site, it doesn’t guarantee the site will be more quickly indexed or ranked any differently.

You only want vital parts of your web site indexed so you are not penalized for pages that might be deemed “not worthy” based mainly on the fact that were never intended for public viewing. These might be pages meant for employees only or administrative pages. As you saw in the definition above, robots.txt files can keep the spiders from crawling designated parts of a site. Meta tags with a “noindex” value also keep useless information and pages from being indexed.

To see if a site has been indexed by a search engine, enter the URL of the domain with “site:” before it. (Example – site:kimskitchen.com) You’ll get back a list of all pages on the site that have been indexed and the current meta tags saved in the search engine index. If you have a non-indexed page it could be because the page has not yet been crawled, the page was crawled and found to be inappropriate or not worthy of indexing, or a robots.txt file indicated that the page should not be crawled. Few pages should stay non-indexed.

To be indexed more quickly

Site owners can, and should, submit a sitemap directly to Google and also directly submit individual pages. Direct submission asks Google to start the indexing process. Activating Google services such as Google Analytics or Google Search Console also signal that your site is ready to be crawled.

Avoid having isolated web pages unless you are doing so purposefully. Link your new pages to older, indexed pages and make changes per Google’s ranking factors that will help your pages be better received. Bot friendly pages provide content of value, the ratio of text to code favors text (as that is what bots read); they contain a navigation bar that provides links to all internal pages; and all non-text content is tagged with URLs and alt text so the bots can “see” things like images that they otherwise can’t read.

There is no guaranteed timeline for when Google will index a new page. People have found it can take anywhere from four days to six months for a site to be crawled and indexed by a search engine. The best course of action is to continue building quality content on your website and regularly submitting to Google new pages and updated site maps. Contact us today to learn how Strategy Driven Marketing experts can help you get your web pages indexed and ranked.