简体   繁体   中英

In WordPress, is there a way to exclude a single directory from being indexed with the meta robots tag instead of single pages or posts?

I know that this applies the meta robots tag to specific pages or posts:

<?php if ($post->ID == X || $post->ID == Y) { echo '<meta name="robots"
content="noindex,follow">'; } ?>    

But I'd like to apply the same meta tag to a specific directory with 70 files in it. Is there any way to accomplish this in php? I'd so much appreciate an answer. Thanks a million.

meta robots tags are used on web pages only. These web pages can be either posts or pages as in wordpress. Now, to hide a directory from indexing, you need to use a different method.

First of all, make sure that you have NOT disallowed the directory through robots.txt file. Then, make a htaccess file in your directory and add the following line to prevent indexing.

Header set X-Robots-Tag "noindex, nofollow"

Now see, if Google has already indexed the files inside it, then you should go to webmaster tools and remove those urls from google's index there. The solution above will work on files created after you employ this solution. But, one thing that I want you to clarify in your answer is the type of files inside your directory? If you can let me know the extension of files inside your directory then I can provide you a custom solution. For example, say you have 60 pdf files and 10 html files and you are interested in hiding only pdf files, so there is a solution for this also.

Otherwise, if your query is resolved, then great.

robots.txt add the following line: Disallow: /your/path/to/the/folder/that/should/not/be/indexed/

I'm not sure what kind of pages you have, wordpress or custom the php methode is possible as well, create a file with an array() that contain all pages that shouldn't be indexed and do something with it in an if and else statement, for example if page is in your array write your meta tag and if page isn't in the array do something else or do nothing.. The file should be included in your pages, if you use the same file to load the meta tags on all of the pages you could include it there ..

<?php
$nofollow = array('name1', 'name2', 'name3', 'name4');
$curr_ID = 'get the current page ID'; // write something that is able to put the current page ID here..
if (in_array($curr_ID, $nofollow))
  {
  echo '<meta name="robots" content="noindex,follow">'; // the page ID is in the array 
  }
else
  {
  echo '<!--// Do nothing or do something else //-->'; // The page ID is not in your array, do what you'd like to do here..
  }
?>

Is this what you've been looking for?

EDIT

500 indexed pages are a lot of pages and unwanted results but you can get them totally removed from the google index here and you can also request a takedown if the documents contain personal information, search Google for information about this law: european law: You have the right to be forgotten

About noindex, nofollow, noarchive:

noindex is: Disallow indexing your pages

nofollow is: Disallow following..

noarchive is: Disallow cashe for your pages in searchengines

The .htaccess answer below is possible as well but as Aman Kapoor said in his answer don't use both, robots.txt and .htaccess for the same thing you would like to do. You can use 1 methode only.

.htaccess example code 1:

<IfModule mod_headers.c>
#   Header set X-Robots-Tag "noindex, nofollow, noarchive"
# this can be used to exclude the html extension, change html 
# to an other extension if you need an other to exclude..
# Remove your-folder in the code below and use the htaccss file in the same folder,
# your-folder was added to explaine if there is something you would like to exclude in a folder inside the folder where you would like to exclude something, this is how you can do that...
<FilesMatch "^your-folder\.html$">
Header set X-Robots-Tag: "noindex, nofollow, noarchive"
</FilesMatch>
</IfModule>

The example using .htaccess below does exclude all files that match with the extensions you added to it, example 2:

<IfModule mod_headers.c>
# Add more extensions if needed or remove the extensions you do not want to exclude! (jpe?g is both jpeg and jpg)
<FilesMatch "\.(php|htm?l|html|doc|pdf|png|jpe?g|gif)$">
     Header set X-Robots-Tag "noindex, noarchive, nosnippet"
</FilesMatch>
# </IfModule>

I updated my answer with correct .htaccess code to do it with .htaccess , it's working but you can use the server configuration files or robots.txt methode to stop indexing a folder as well.

I would like to say just try what you like most and after adding your choice (robots.txt or .htaccess or server configuration): don't forget to go to the google webmaster tools and remove the documents/files and other from the index, after this you will need to request a full site indexing for your website in webmastertools..

What is best to do?

  1. Server configuration is the best if possible, most website owners are not able to do so as they don't have access to the files...
  2. The 2nd .htaccess example if there are multiple extensions that should be excluded from the search index, if you don't have access to the configuration files, this is the best option to do what you want.
  3. robots.txt in the documentroot, you'll always need a robots.txt file as this is the first thing a spider will download to check what to index however don't write the DisAllow line to exclude the folder to the file if you use server configuration or .htaccess

I believe the best you can do is add the 2nd htaccess example (edit the extensions to match your needs), go to webmastertools and remove the 500 documents from their index and next request a full site crawling/add site to index again in webmaster tools

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM