When getting serious about Search Engine Optimization (SEO) there are a few things that your website should have.  One of them consists of a robots.txt file that should be properly compiled and placed on the web server where your domain’s files are located.  Why should you have a robots.txt file to begin with?  The robots.txt indicates specific information about the url structure of your domain and the way Googlebot should crawl it.  It’s important that you create one and do it properly to ensure that your urls from your domain are properly indexed.

Getting Started With Your Robots.txt File

The process of creating the file is definitely technical and perhaps slightly overwhelming for the non-tech crowd.  Although it’s not something that a Google search won’t fix.  Your robots.txt file consists of primarily variables that identify certain functions and behaviors to Googlebot when it attempts to crawl the domain of your website  Now for the most part there’s actually only two that you will be using as outlined below.  These are the most common variables that can be found on your average website.

Creating Your Robots.txt File

It’s quite simple to do and should take you a few minutes to create.  You will need a text editor such as Notepad on your Windows PC and equivalent for the Mac enthusiasts.  Launch a blank file and let’s get started.  Below are the two variables I talked about above that should be included in your robots.txt file document.  Here’s  how to create a simple file for your domain.

  1. Notepad/Text Editor – open a blank document and save it as ‘robots.txt’ on your desktop.  Inside it include these variables to guide Googlebot when crawling your domain.
  2. User-Agent – each browser that’s available on the web today such as Internet Explorer, Mozilla FireFox, Safari, etc. have their own unique identification commonly referred to as a ‘User Agent’.  This particular variable identifies which of the user agents are allowed to crawl the domain of your website.  You can perform a Google search to find the id’s of the different user agents and assist you with any restrictions that you’d want to place.  Typically the wildcard * is used here to allow all user-agents to access your domain.  A sample line of syntax would look something like this – User-Agent: *  When the Googlebot crawls your domain it will look thru the robots.txt file to check for permissions.  If you were wanting to exclude Googlebot from crawling your domain you  need to include the user-agent name instead of the wildcard.
  3. Disallow – a commonly used variable for robots.txt files to exclude certain urls from being indexed in the Search Engine Result Pages (SERP).  It comes in quite handy for content management system in  particular which consist of an administrator login url address.  Those are the types of links that you don’t want to have floating around publicly in Google’s index.  A line of syntax for the disallow variable would look something like this – Disallow: /link-to-my-page.html  When reviewing your robots.txt file the Googlebot will identify that you’ve chosen to exclude that particular url of your website from inclusion in its index.  All links that you choose to exclude from being indexed should be in the form of a relative path.

The robots.txt file is universal for all major search engines such as Google, Yahoo, & Bing just to name a few.  Therefore once you’ve placed it in the root directory of your domain on your web server it will become accessible by all user agents.  I’ve used the Googlebot user agent as an example above just to paint a visual picture for those interested in creating their own robots.txt file.  You can browse thru more detailed information as well on robots.txt in case you want to get a deeper understanding of its purpose and benefits.