简体   繁体   English

如何通过.htaccess阻止“ bot *”机器人

[英]How to block “bot*” bot via .htaccess

I have the following entry in my Awstats file: 我的Awstats文件中有以下条目:

Unknown robot (identified by 'bot*')

How can I block this bot? 如何阻止该机器人?
I tried the following separately but none of them seems to be catching it: 我分别尝试了以下方法,但似乎都没有发现:

RewriteCond %{HTTP_USER_AGENT} ^bot* 

RewriteCond %{HTTP_USER_AGENT} bot\* 

RewriteCond %{HTTP_USER_AGENT} bot[*]

Here is the full .htaccess code I am using: 这是我正在使用的完整.htaccess代码:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^bot*
RewriteRule .? - [F,L]

Tested three regex values ( ^bot* , bot\\* , bot[*] ) in the second line, none of them stopped the bot. 在第二行中测试了三个正则表达式值( ^ bot *bot \\ *bot [*] ),但没有一个停止了bot。

The asterisk ( * ) is not literal. 星号( * )不是文字。 AWStats is simply stating that it used that particular rule to check if the request was being made by a bot. AWStats只是在说明它使用该特定规则来检查请求是否由机器人发出。 In your case, bot* means that the user agent string started with bot , and it found a match. 在您的情况下, bot*表示用户代理字符串以bot开头,并且找到了匹配项。

As the asterisk is not literal, you can use the following instead: 由于星号不是文字,因此可以使用以下代码:

RewriteCond %{HTTP_USER_AGENT} ^bot [OR]  # matches bot* (the same as ^bot.*$)
RewriteCond %{HTTP_USER_AGENT} bot$       # matches *bot (the same as ^.*bot$)

Note: I should say here that it is better to check your access logs to see exactly what these user agents are and block them specifically. 注意:在这里我要说的是,最好检查您的访问日志以准确了解这些用户代理是什么,并专门阻止它们。 You don't want to find yourself in a position whereby you are blocking bots that you might want. 您不想让自己处于阻止您可能想要的机器人的位置。


Recommendation: Change your rule from RewriteRule .? - [F,L] 建议:从RewriteRule .? - [F,L]更改您的规则RewriteRule .? - [F,L] RewriteRule .? - [F,L] to RewriteRule ^ - [F,L] RewriteRule .? - [F,L] RewriteRule ^ - [F,L]

We can block a bots using the bot exact name inside the .htaccess file. 我们可以使用.htaccess文件中的bot确切名称阻止bot。 Below example definitely will help you, currently i am using the same setup, its saving my server resource. 下面的示例肯定会帮助您,当前我使用的是相同的设置,它可以节省服务器资源。

SetEnvIfNoCase User-Agent "Yandex" bad_bot    
SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot    
SetEnvIfNoCase User-Agent "MJ12bot" bad_bot

<IfModule mod_authz_core.c>
 <Limit GET POST>
  <RequireAll>
   Require all granted
   Require not env bad_bot
  </RequireAll>
 </Limit>
</IfModule>

Let me know if you have any queries. 让我知道您是否有任何疑问。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM