简体   繁体   中英

How to remove subdomain from google index, which links to the main domain

Can anyone tell me how can i remove the subdomain from google index, which links to the main domain.

Lets say my domain is www.myweb.com and my subdomain is cdn.myweb.com. Now here the Document Root of the subdomain is same as the main domain. So i could not use the robot.txt to stop google indexing, as it will remove indexing the main domain links too.

I search on google, bing and stackoverflow too, but i could not find the perfect answer for the question. Does any solve from yours side?

You can use dynamic robots.txt for this purpose. Something like this...

httpd.conf (.htaccess):

RewriteRule /robots\.txt$ /var/www/myweb/robots.php

robots.php:

<?php
header('Content-type: text/plain');

if($_SERVER['HTTP_HOST']=='cdn.myweb.com'){ 
    echo "User-agent: *\n";
    echo "Disallow: /\n";
}else{              
    include("./robots.txt");    
}

I'm using nginx, and have multiple subdomains in the same server block. I'd like the www subdomain to be included in google's index, and the rest of the subdomains to be excluded.

First, in my server block of the nginx config, I added the following to serve 2 different files at /robots.txt depending on the domain.

location ~ /robots.txt {
    if ($host = 'www.example.com') {
      rewrite ^/robots\.txt /robots.www.txt last;
    }
  }

Then in my site's root directory, and have the following 2 files:

  • robots.txt which blocks crawling and is the default for all subdomains
# Do not crawl subdomain
User-Agent: *
Disallow: /
  • robots.www.txt which allows crawling of all the site and is only served at www.example.com/robots.txt
User-agent: *
Disallow:

First thing is to add the robots.txt but in my case since my page were already indexed with the CDN subdomain it was too late for the robots. The best way I found was to go to the Google Webmaster Tools, add my cdn domain ( cdn.mysite.com ). Then go to Google index -> Remove URLs and removed the / url. It took few days to take effect.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM