简体   繁体   中英

Using .htaccess to block Googlebot from URLs ending with 4-6 digits, REGEX?

How can we write .htaccess to block Googlebot UA from accessing URLs ending in forward-slash, followed by 4-6 digits?

We're wasting a lot of our Googlebot crawl budget because it's crawling "no-index" pages.

The plan is to use .htaccess to block the UA from URLs ending with a forward slash, followed by 4-6 digits.

Ex:

https://example.com/folder/folder/12563
https://example.com/folder/folder/125637
https://example.com/folder/folder/1563

I think the REGEX looks something like this:

\/\d{4,6}$

But how do I configure .htaccesss, and only for a specific UA (googlebot)?

Thanks!

You can use this:

RewriteEngine on

RewriteCond ℅{HTTP_USER_AGENT} googlebot [NC]
RewriteRule /\d{4,6}$ - [F,L]

This will return a Forbidden HTTP 403 error for googlebot if they try to access the restricted URLs on your server.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM