简体   繁体   中英

Exclude Crawler from Subdomain with .htaccess

I want to stop Crawler from crawling the subdomain tools.subdomain.com I found a Snippet on the Internet which show following Rewrite Rule:

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]

How can i manage to block those Crawler on this subdomain, or just allow the current up to date Browser to visit the Subdomain? I Want to manage this through .htaccess, because not every crawler accepts the robots.txt. For the robots.txt i have following rewrite Condition.

RewriteCond %{HTTP_HOST} =testing.subdomain.com
RewriteRule ^robots\.txt$ /robots_testing.txt [L]

Cheers

Sven

It depends on your server layout.

Segregated subdomain

If the subdomain has its own document root, it's enough place an .htaccess file in the subdomain's document root and write the directives you specified in the htaccess file:

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]

Shared subdomain

If the subdomain is using the same document root as the toplevel domain, it's enough to add a RewriteCond to the above:

RewriteCond %{HTTP_HOST} ^tools\.subdomain\.com$
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]

Please note (1) : the syntax ^tools\\.subdomain\\.com$ is needed to match exactly the entire name of the host; besides, since it's a regular expression, dots must be escaped with a backslash.

Please note (2) : the syntax of the last RewriteCond may vary according to the bots you want to exclude.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM