WordPress is the leading content management system (CMS) in the world, but one thing it doesn’t come with is a robots.txt file in place. This is one of the few things you still have to make up manually. If you don’t know how to do it, it can seem confusing at first.
In this tutorial, we are going to show you how to prepare your robots.txt file for WordPress.
What is it?
The robots.txt file tells various search engines to either index or not index a page. It doesn’t stop them from crawling a page, but it does tell them what content you wish to index and what content you want them to ignore.
Generally, reputable search engines will honour your requests.
At its core, it is just a simple txt file created by a programme like Notepad. You upload it to your public_html file on your host’s servers. Test it using Google’s Webmaster tools and if it works you’re good to go.
The Terminology of the File
The terminology revolves around using two key commands. No file can be more than 500 lines or search engines such as Google will stop reading at that point, which could have some unintended consequences. This shouldn’t be a problem for most people, however. The two key commands are:
User-agent is the command used for the search engine providers. Each search engine has its own name, but for most people it isn’t necessary to know this as an asterisk can be used instead. The asterisk simply says the bots from all search engines can crawl your site. A forward slash command means no bots should crawl your site.
The disallow command will consist of a list of web pages that you don’t want search engines to crawl. If you don’t want to disallow anything, you can leave it blank. But you can also add the page link to prevent crawlers from indexing the page.
Bear in mind that if you type ‘/123’ under the disallow command search engines will refuse to index ‘/123’ and ‘/1234’. To prevent this, you should add a dollar sign after the page address. This will say to an engine such as Google that you only want this precise address to be excluded.
You should also remember that you only need to detail any CHANGES you want to make to the usual process. It’s pointless to make a robots.txt file where you tell search engines to index every page on your site with no exceptions. It’s going to do this anyway.
Uploading it to WordPress
You have to upload your robots.txt file to every root directory of every WordPress site you have on your servers, so it also includes any subdirectories.
If you only have one WordPress site, you can add the file to your public_html file and go from there. At this point, you only have to go through the testing stage.
The testing stage can be done through Google. It has a section on Webmaster Tools that tests the robots.txt file on your site. You can also pick up a third-party browser-based tester. These can tell you which robots.txt file is in place.
If you see your rules displayed, you are good to go!