简体   繁体   English

使用php为给定的字符串生成唯一的ID

[英]Generating a unique id for a given string using php

I'm using Zend_Cache_Core with Zend_Cache_Backend_File to cache results of queries executed for a model class that accesses the database. 我正在使用Zend_Cache_Core和Zend_Cache_Backend_File来缓存针对访问数据库的模型类执行的查询的结果。

Basically the queries themselves should form the id by which to cache the obtained results, only problem is, they are too long. 基本上,查询本身应该形成用于缓存获得的结果的ID,唯一的问题是它们太长了。 Zend_Cache_Backend_File doesn't throw an exception, PHP doesn't complain but the cache file isn't created. Zend_Cache_Backend_File不会引发异常,PHP不会抱怨,但是不会创建缓存文件。

I've come up with a solution that is not efficient at all, storing any executed query along with an autoincrementing id in a separate file like so: 我想出了一个效率极低的解决方案,将执行的所有查询以及自动递增的ID都存储在单独的文件中,如下所示:

0->>SELECT * FROM table 1->>SELECT * FROM table1,table2 2->>SELECT * FROM table WHERE foo = bar 0->> SELECT * FROM表1->> SELECT * FROM table1,table2 2->> SELECT * FROM表WHERE foo = bar

You get the idea; 你明白了。 this way i have a unique id for every query. 这样,我对每个查询都有唯一的ID。 I clean out the cache whenever an insert, delete, or update is done. 每当完成插入,删除或更新时,我都会清理缓存。

Now i'm sure you see the potential bottleneck here, for any test, save or fetch from cache two (or three, where we need to add a new id) requests are made to the file system. 现在,我确定您在这里看到了潜在的瓶颈,对于任何测试,保存或从高速缓存中获取或读取都会对文件系统发出两个(或三个,我们需要在其中添加新的ID)请求。 This may even defeat the need to cache alltogether. 这甚至可以消除全部缓存的需求。 So is there a way i can generate a unique id, ie a much shorter representation, of the queries in php without having to store them on the file system or in a database? 那么,有没有一种方法可以生成php中查询的唯一ID(即更短的表示形式)而不必将其存储在文件系统或数据库中?

Strings are arbitrarily long, so obviously it's impossible to create a fixed-size identifier that can represent any arbitrary input string without duplication. 字符串任意长,因此显然不可能创建一个固定大小的标识符来表示任意输入字符串而无需重复。 However, for the purposes of caching, you can usually get away with a solution that's simple "good enough" and reduces collisions to an acceptable level. 但是,出于缓存的目的,通常可以避免使用一种简单的“足够好”的解决方案,并将冲突减少到可接受的水平。

For example, you can simply use MD5, which will only produce a collision in 1 in 2 128 cases. 例如,您可以简单地使用MD5,它只会在2 128种情况下发生1次碰撞。 If you're still worried about collisions (and you probably should be, just to be safe) you can store the query and the result in the "value" of the cache, and check when you get the value back that it's actually the query you were looking for. 如果您仍然担心冲突(为了安全起见,您可能应该这样做),则可以将查询结果存储在缓存的“值”中,并检查何时将值取回,它实际上是查询您正在寻找。

As a quick example (my PHP is kind of rusty, but hopefully you get the idea): 举个简单的例子(我的PHP有点生锈,但是希望您能理解):

$query = "SELECT * FROM ...";

$key = "hash-" + hash("md5", $query);
$result = $cache->load($key);
if ($result == null || $result[0] != $query) {
    // object wasn't in cache, do the real fetch and store it
    $result = $db->execute($query); // etc

    $result = array($query, $result);
    $cache->save($result, $key);
}

// the result is now in $result[1] (the original query is in $result[0])

MD5!! MD5!

Md5 generates a string of length 32 that seems to be working fine, the cache files are created (with filenames about of length 47) so it seems as though the operating system doesn't reject them. Md5生成一个长度为32的字符串,看起来似乎工作正常,创建了缓存文件(文件名的长度约为47),因此操作系统似乎不拒绝它们。

//returns id for a given query
function getCacheId($query) {
    return md5($query);
}

And that's it! 就是这样! But there's that issuse of collisions and i think salting the md5 hash (maybe with the name of the table) should make it more robust. 但是那是冲突的问题,我认为对md5哈希(可能带有表的名称)添加盐分应该使它更健壮。

//returns id for a given query
function getCacheId($query, $table) {
    return md5($table . $query);
}

If anyone wants the full code for how i've implemented the results caching, just leave a comment and i'll be happy to post it. 如果有人想要我实现结果缓存的完整代码,请发表评论,我很乐意将其发布。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM