简体   繁体   English

使用.htaccess阻止PhantomJS机器人

[英]Use .htaccess to block PhantomJS bot

I want to block traffic from something that appears to be a bot or some other malicious traffic (haven't quite figured out what it is, but I suppose I don't want it). 我想阻止似乎是机器人或其他恶意流量的流量(尚未完全弄清流量是什么,但我想我不想这么做)。 So far I have been blocking it by IP, however the traffic keeps coming from different locations. 到目前为止,我一直通过IP对其进行阻止,但是流量始终来自不同的位置。

Common for this traffic is that the user agent is PhantomJS, and they all show the following local address (perhaps with some variations): 这种流量的共同点是用户代理是PhantomJS,它们都显示以下本地地址(可能会有一些变化):

file:///home/poo_master/price_parse/resource_cache/140578757867264tmp2.html 文件:///home/poo_master/price_parse/resource_cache/140578757867264tmp2.html

Is it possible to use .htaccess to block either PhantomJS or anything containing "poo_master/price_parse/resource_cache/" 是否可以使用.htaccess阻止PhantomJS或任何包含“ poo_master / price_parse / resource_cache /”的内容

To block bots/scrapers by useragent OR by requested URL add these rewrite rules to .htaccess: 要按用户代理或所请求的URL阻止漫游器/抓取程序,请将以下重写规则添加到.htaccess中:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(PhantomJS|wget|HTTrack|python).*$ [OR]
RewriteCond %{REQUEST_URI} ^.*poo_master/price_parse.*$ 
RewriteRule . - [F,L]

Update your .htaccess file with below code. 使用以下代码更新您的.htaccess文件。 Hope it will work. 希望它能工作。

RewriteEngine on
RewriteCond $1 !^(index\.php|resources|robots\.txt)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{HTTP_USER_AGENT} ^.*(PhantomJS|wget|HTTrack|python).*$ [OR]
RewriteCond %{REQUEST_URI} ^.*poo_master/price_parse.*$ 
RewriteRule ^(.*)$ index.php/$1 [L,QSA]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM