简体   繁体   中英

Generating a unique id for a given string using php

I'm using Zend_Cache_Core with Zend_Cache_Backend_File to cache results of queries executed for a model class that accesses the database.

Basically the queries themselves should form the id by which to cache the obtained results, only problem is, they are too long. Zend_Cache_Backend_File doesn't throw an exception, PHP doesn't complain but the cache file isn't created.

I've come up with a solution that is not efficient at all, storing any executed query along with an autoincrementing id in a separate file like so:

0->>SELECT * FROM table 1->>SELECT * FROM table1,table2 2->>SELECT * FROM table WHERE foo = bar

You get the idea; this way i have a unique id for every query. I clean out the cache whenever an insert, delete, or update is done.

Now i'm sure you see the potential bottleneck here, for any test, save or fetch from cache two (or three, where we need to add a new id) requests are made to the file system. This may even defeat the need to cache alltogether. So is there a way i can generate a unique id, ie a much shorter representation, of the queries in php without having to store them on the file system or in a database?

Strings are arbitrarily long, so obviously it's impossible to create a fixed-size identifier that can represent any arbitrary input string without duplication. However, for the purposes of caching, you can usually get away with a solution that's simple "good enough" and reduces collisions to an acceptable level.

For example, you can simply use MD5, which will only produce a collision in 1 in 2 128 cases. If you're still worried about collisions (and you probably should be, just to be safe) you can store the query and the result in the "value" of the cache, and check when you get the value back that it's actually the query you were looking for.

As a quick example (my PHP is kind of rusty, but hopefully you get the idea):

$query = "SELECT * FROM ...";

$key = "hash-" + hash("md5", $query);
$result = $cache->load($key);
if ($result == null || $result[0] != $query) {
    // object wasn't in cache, do the real fetch and store it
    $result = $db->execute($query); // etc

    $result = array($query, $result);
    $cache->save($result, $key);
}

// the result is now in $result[1] (the original query is in $result[0])

MD5!!

Md5 generates a string of length 32 that seems to be working fine, the cache files are created (with filenames about of length 47) so it seems as though the operating system doesn't reject them.

//returns id for a given query
function getCacheId($query) {
    return md5($query);
}

And that's it! But there's that issuse of collisions and i think salting the md5 hash (maybe with the name of the table) should make it more robust.

//returns id for a given query
function getCacheId($query, $table) {
    return md5($table . $query);
}

If anyone wants the full code for how i've implemented the results caching, just leave a comment and i'll be happy to post it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM