简体   繁体   English

如何从Google索引中删除子域,该子域链接到主域

[英]How to remove subdomain from google index, which links to the main domain

Can anyone tell me how can i remove the subdomain from google index, which links to the main domain. 谁能告诉我如何从Google索引中删除该子域,该子域链接到主域。

Lets say my domain is www.myweb.com and my subdomain is cdn.myweb.com. 可以说我的域名是www.myweb.com,而我的子域名是cdn.myweb.com。 Now here the Document Root of the subdomain is same as the main domain. 现在,这里的子域的文档根目录与主域相同。 So i could not use the robot.txt to stop google indexing, as it will remove indexing the main domain links too. 因此,我无法使用robot.txt停止对Google进行索引编制,因为它也会删除对主要域链接的编制索引。

I search on google, bing and stackoverflow too, but i could not find the perfect answer for the question. 我也在google,bing和stackoverflow上搜索,但是我找不到这个问题的完美答案。 Does any solve from yours side? 您有什么解决办法吗?

You can use dynamic robots.txt for this purpose. 您可以为此使用动态robots.txt。 Something like this... 像这样

httpd.conf (.htaccess): httpd.conf(.htaccess):

RewriteRule /robots\.txt$ /var/www/myweb/robots.php

robots.php: robots.php:

<?php
header('Content-type: text/plain');

if($_SERVER['HTTP_HOST']=='cdn.myweb.com'){ 
    echo "User-agent: *\n";
    echo "Disallow: /\n";
}else{              
    include("./robots.txt");    
}

I'm using nginx, and have multiple subdomains in the same server block. 我正在使用nginx,并且在同一服务器块中有多个子域。 I'd like the www subdomain to be included in google's index, and the rest of the subdomains to be excluded. 我希望将www子域包含在Google的索引中,并将其余子域排除在外。

First, in my server block of the nginx config, I added the following to serve 2 different files at /robots.txt depending on the domain. 首先,在我的nginx配置服务器块中,我添加了以下内容,以根据域在/robots.txt中提供2个不同的文件。

location ~ /robots.txt {
    if ($host = 'www.example.com') {
      rewrite ^/robots\.txt /robots.www.txt last;
    }
  }

Then in my site's root directory, and have the following 2 files: 然后在我网站的根目录中,并具有以下2个文件:

  • robots.txt which blocks crawling and is the default for all subdomains robots.txt会阻止抓取,并且是所有子域的默认值
# Do not crawl subdomain
User-Agent: *
Disallow: /
  • robots.www.txt which allows crawling of all the site and is only served at www.example.com/robots.txt robots.www.txt ,可抓取所有网站,仅在www.example.com/robots.txt上投放
User-agent: *
Disallow:

First thing is to add the robots.txt but in my case since my page were already indexed with the CDN subdomain it was too late for the robots. 第一件事是添加robots.txt,但就我而言,由于我的页面已被CDN子域索引,因此对于机器人来说为时已晚。 The best way I found was to go to the Google Webmaster Tools, add my cdn domain ( cdn.mysite.com ). 我发现最好的方法是转到Google网站站长工具,添加我的cdn域( cdn.mysite.com )。 Then go to Google index -> Remove URLs and removed the / url. 然后转到Google索引->删除URL并删除/ URL。 It took few days to take effect. 生效花了几天时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM