简体   繁体   中英

How can I block mp3 crawlers from my website under Apache?

Is there some way to block access from a referrer using a .htaccess file or similar? My bandwidth is being eaten up by people referred from http://www.dizzler.com which is a flash based site that allows you to browse a library of crawled publicly available mp3s.

Edit: Dizzler was still getting in (probably wasn't indicating referrer in all cases) so instead I moved all my mp3s to a new folder, disabled directory browsing, and created a robots.txt file to (hopefully) keep it from being indexed again. Accepted answer changed to reflect futility of my previous attempt :P

That's like saying you want to stop spam-bots from harvesting emails on your publicly visible page - it's very tough to tell the difference between users and bots without forcing your viewers to log in to confirm their identity.

You could use robots.txt to disallow the spiders that actually follow those rules, but that's on their side, not your server's. There's a page that explains how to catch the ones that break the rules and explicitly ban them : Using Apache to stop bad robots [evolt.org]

If you want an easy way to stop dizzler in particular using the .htaccess, you should be able to pop it open and add:

<Directory /directoryName/subDirectory>
Order Allow,Deny
Allow from all
Deny from 66.232.150.219
</Directory>

From this site : (put this in your .htaccess file)

RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://((www\.)?dizzler\.com [NC]
RewriteRule .* - [F]

You could use something like

SetEnvIfNoCase Referer dizzler.com spammer=yes

Order allow,deny
allow from all
deny from env=spammer

Source: http://codex.wordpress.org/Combating_Comment_Spam/Denying_Access

这不是一个非常优雅的解决方案,但您可以阻止该网站的抓取机器人,然后重命名您的mp3文件以打破网站上已有的链接。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM