简体   繁体   中英

PHP File existence check vs MySQL SELECT server load

i'm implementing in a project of mine an ip banning functionality. First of all, i'd like to avoid .htaccess for this purpose 'cause the CMS probably would reset it upon modifications, so i have to use a PHP-send-header-and-die solution. Obviously every HTTP request will be checked.

Considering an highly trafficated site, i have two solutions to store ip ban infos :

1 - In a directory, let's say /bans/, i can create N files where N = number of banned ips, so :

/bans/23.23.23.23.ban

would ban 23.23.23.23, in this case all i have to do from my script is to check with file_exists , for instance :

<?php
    if( file_exists("bans/".$_SERVER['REMOTE_ADDR'].".ban"){
        header("HTTP/1.0 403 Forbidden");
        die();
    }
    else{
        // Continue surfing ....
    }
?>

2 - Use a MySQL table, let's say cms_bans, and execute a SELECT for every HTTP request to check if the ip is in the ban list.

Considering those 2 solutions, which one has less overload impact (filesystem vs mysql :D), assuming MySQL query caching is disabled ?

Please only motivated answers, not just personal preferences.

Thanks

Use a MySQL table with the MEMORY engine. Dump it every once in a while to another, permanent table to keep the IPs stored and persistent after a server restart.

I would tend to think the file_exists() has less overhead since there's no remote connection to make and it can be cached by PHP. But if you've got a huge number of bans, and the db connection is already being made for some other part of the app, then the MySQL solution starts to look better and is certainly easier to manage. That said, I'm not a fan of either approach and would suggest moving the bans to the network layer, via your firewall/proxy/load balancer.

Alternatively, if you don't have a large number of bans and they don't change very often, you'll be better off storing the list directly as a PHP array, include()'ing that in your code, and then using in_array() to scan for the bans:

$banned = array(
    '1.2.3.4',
    '2.3.4.5',
    '3.4.5.6'
);
if (in_array($ip, $banned)) { baninate(); }

I would prefer the mysql database. You can store additional data in the table like banned_by , banned_on , bann_reason and access_count .

In your code you only do a

SELECT COUNT(1) FROM banns WHERE ip = '23.23.23.23' 

assuming you have an index on the ip column this should be pretty fast. If someone hit's a ban you do an

 UPDATE banns SET access_count = access_count + 1 WHERE ip = '23.23.23.23'

Why stick with one over the other?

I would set up a MySQL-based ban table that I can expand easily. MySQL is a fast system, and is a lot more flexible keeping in mind the future.

But, you can also cache the results yourself in a file and read from that. Now, anyone who's familiar with MySQL can add bans directly, they don't have to know your special format in order to work directly with it (in the case of extensions to your ban system, working with it, etc). The question here, then, is file permissions. So, you'd just have to throw in a little extra code to compensate for that.

If they need the cache updated, give them the ability.

您没有说您正在使用什么CMS,但是如果是开源或您自己的CMS,为什么不修改它以使用.htaccess禁止?

I don't think this question is really performance related.
Such a simple key-value lookup will never be a bottleneck compared to the other parts of the application.
That's mistake which being made very often: people tend to optimize less resource-consuming parts of the site, without any particular reason but because it just came to their mind.

That's why question should be motivated, not going out of just personal preferences.

As far as your original question, using file_exists() is quicker for small numbers of bans (about <1000 bans) whereas using MySQL is quicker for larger numbers. The connection is made to the database only one time and the answer is sent back only one time, so the "bottleneck" of MySQL only adds a set, constant amount of time to the required time to do the query. MySQL (and other database) software then scales extremely well because there is a constant byte-width for each row, so it need only check bytes nRX to nRX+Y for integer multiples of n.

In older file systems the Operating System can not make such an assumption since files can be of variable length. Thus it would scan for the end_of_file bit. Newer operating systems create a database of every file (The "File Allocation Table") at the beginning of the partition and it need merely search this. The problem is- the more files on the computer, the longer it takes to search this table. Also, fragmentation of the drive can make it harder to find if a file still exists. These little slow downs don't equal the time it takes to connect to an SQL database... for small numbers of bans.

What would be an even better solution would be to have a text file containing one ban per line.

bans.txt:
23.23.23.23
192.168.1.42
200.200.200.200

Then you just use strpos($file_contents, $_SERVER["REMOTE_ADDR"] . Note that the fewer lines of PHP you have, the quicker it will ultimately run since the C back-end of PHP is about 100x quicker than the interpretation. Therefore in two lines we can get_file_contents() (dump the contents to RAM) and strpos() (search for a string within the RAM) and it's entirely handled by the C back-end, which iterates through quite quickly.

There are even quicker ways to do it, too, if you're willing to write your own database that keeps the bans listed numerically (allowing binary search).

Although as several people have already stated, this is not where any main bottleneck will occur in your server. Optimizing the "check for ban" portion of your website will net a 0.01% speed increase to your whole site. What you want to be really careful to optimize are loops that run >100 times, calls to remote servers, and queries which return several lines of a database to be parsed.

Also, don't write a function to perform something which already has a built-in PHP function. I spent a year manually parsing string with hundreds of substr(strpos()) lines before I learned how to use preg_replace()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM