简体   繁体   English

MySQL和PDO,加速查询并从MySQL函数(例程)获取结果/输出?

[英]MySQL and PDO, speed up query and get result/output from MySQL function (routine)?

Getting the Value:获取价值:

I've got the levenshtein_ratio function, from here , queued up in my MySQL database.我有levenshtein_ratio函数,从这里,在我的 MySQL 数据库中排队。 I run it in the following way:我按以下方式运行它:

    $stmt = $db->prepare("SELECT r_id, val FROM table WHERE levenshtein_ratio(:input, someval) > 70");
    $stmt->execute(array('input' => $input));
    $result = $stmt->fetchAll(); 

    if(count($result)) {
        foreach($result as $row) {
            $out .= $row['r_id'] . ', ' . $row['val'];
        }
    }

And it works a treat, exactly as expected.和预期的一样,它是一种享受。 But I was wondering, is there a nice way to also get the value that levenshtein_ratio() calculates?但我想知道,有没有一种很好的方法来获得levenshtein_ratio()计算的值?

I've tried:我试过了:

    $stmt = $db->prepare("SELECT levenshtein_ratio(:input, someval), r_id, val FROM table WHERE levenshtein_ratio(:input, someval) > 70");
    $stmt->execute(array('input' => $input));
    $result = $stmt->fetchAll(); 

    if(count($result)) {
        foreach($result as $row) {
            $out .= $row['r_id'] . ', ' . $row['val'] . ', ' . $row[0];
        }
    }

and it does technically work (I get the percentage from the $row[0] ), but the query is a bit ugly, and I can't use a proper key to get the value, like I can for the other two items.技术上确实有效(我从$row[0]获得百分比),但查询有点难看,我无法使用正确的键来获取值,就像我可以为其他两项一样。

Is there a way to somehow get a nice reference for it?有没有办法以某种方式获得一个很好的参考?

I tried:我试过:

$stmt = $db->prepare("SELECT r_id, val SET output=levenshtein_ratio(:input, someval) FROM table WHERE levenshtein_ratio(:input, someval) > 70");

modelling it after something I found online, but it didn't work, and ends up ruining the whole query.根据我在网上找到的东西对其进行建模,但它不起作用,最终破坏了整个查询。

Speeding It Up:加快速度:

I'm running this query for an array of values:我正在为一组值运行此查询:

foreach($parent as $input){
    $stmt = ...
    $stmt->execute...
    $result = $stmt->fetchAll(); 

    ... etc
}

But it ends up being remarkably slow.但它最终会非常缓慢。 Like 20s slow, for an array of only 14 inputs and a DB with about 350 rows, which is expected to be in the 10,000's soon.对于只有 14 个输入的数组和一个大约有 350 行的 DB,预计很快就会达到 10,000 行,就像慢 20 秒一样。 I know that putting queries inside loops is naughty business, but I'm not sure how else to get around it.我知道将查询放入循环中是一件很讨厌的事情,但我不知道还有什么方法可以绕过它。

EDIT 1编辑 1

When I use当我使用

$stmt = $db->prepare("SELECT r_id, val SET output=levenshtein_ratio(:input, someval) FROM table WHERE levenshtein_ratio(:input, someval) > 70");

surely that's costing twice the time as if I only calculated it once?这肯定会花费两倍的时间,就好像我只计算过一次一样? Similar to having $i < sizeof($arr);类似于$i < sizeof($arr); in a for loop?在 for 循环中?

To clean up the column names you can use "as" to rename the column of the function.要清理列名,您可以使用“as”重命名函数的列。 At the same time you can speed things up by using that column name in your where clause so the function is only executed once.同时,您可以通过在 where 子句中使用该列名来加快速度,这样该函数只执行一次。

$stmt = $db->prepare("SELECT r_id, levenshtein_ratio(:input, someval) AS val FROM table HAVING val > 70");

If it is still too slow you might consider ac library like https://github.com/juanmirocks/Levenshtein-MySQL-UDF如果它仍然太慢,您可以考虑使用像https://github.com/juanmirocks/Levenshtein-MySQL-UDF这样的 ac 库

doh - forgot to switch "where" to "having", as spencer7593 noted. doh - 正如 spencer7593 指出的那样,忘记将“何处”切换为“拥有”。

I'm assuming that `someval` is an unqalified reference to a column in the table.我假设 `someval` 是对表中列的非限定引用。 While you may understand that without looking at the table definition, someone else reading the SQL statement can't tell.虽然您可能会理解,如果不查看表定义,其他阅读 SQL 语句的人无法分辨。 As an aid to future readers, consider qualifying your column references with the name of the table or (preferably) a short alias assigned to the table in the statement.作为对未来读者的帮助,请考虑使用表名或(最好)分配给语句中表的短别名来限定您的列引用。

 SELECT t.r_id
      , t.val
   FROM `table` t
  WHERE levenshtein_ratio(:input, t.someval) > 70

That function in the WHERE clause has to be evaluated for every row in the table. WHERE 子句中的那个函数必须为表中的每一行求值。 There's no way to get MySQL to build an index on that.没有办法让 MySQL 建立一个索引。 So there's no way to get MySQL to perform an index range scan operation.所以没有办法让 MySQL 执行索引范围扫描操作。

It might be possible to get MySQL to use an index for the query, for example, if the query had an ORDER BY t.val clause, or if there is a "covering index" available.有可能让 MySQL 为查询使用索引,例如,如果查询有一个ORDER BY t.val子句,或者如果有可用的“覆盖索引”。

But that doesn't get around the issue of needing to evaluate the function for every row.但这并没有解决需要为每一行评估函数的问题。 (If the query had other predicates that excluded rows, then the function wouldn't necessarily need be evaluated for the excluded rows.) (如果查询具有排除行的其他谓词,则不一定需要为排除的行评估该函数。)

Adding the expression to the SELECT list really shouldn't be too expensive if the function is declared to be DETERMINISTIC.如果函数被声明为 DETERMINISTIC,那么将表达式添加到 SELECT 列表中真的不应该太昂贵。 A second call to a DETERMINISTIC function with the same arguments can reuse the value returned for the previous execution.对具有相同参数的 DETERMINISTIC 函数的第二次调用可以重用前一次执行返回的值。 (Declaring a function DETERMINISTIC essentially means that the function is guaranteed to return the same result when given the same argument values. Repeated calls will return the same value. That is, the return value depends only the argument values, and doesn't depend on anything else. (声明一个函数 DETERMINISTIC 本质上意味着当给定相同的参数值时,该函数保证返回相同的结果。重复调用将返回相同的值。也就是说,返回值仅依赖于参数值,而不依赖于还要别的吗。

 SELECT t.r_id
      , t.val
      , levenshtein_ratio(:input, t.someval) AS lev_ratio
   FROM `table` t
  WHERE levenshtein_ratio(:input2, t.someval) > 70

(Note: I used a distinct bind placeholder name for the second reference because PDO doesn't handle "duplicate" bind placeholder names as we'd expect. (It's possible that this has been corrected in more recent versions of PDO. The first "fix" for the issue was an update to the documentation noting that bind placeholder names should appear only once in statement, if you needed two references to the same value, use two different placeholder names and bind the same value to both.) (注意:我对第二个引用使用了不同的绑定占位符名称,因为 PDO 没有像我们期望的那样处理“重复”绑定占位符名称。(这可能在更新的 PDO 版本中已得到纠正。第一个“修复”这个问题是对文档的更新,指出绑定占位符名称应该只在语句中出现一次,如果您需要对相同值的两个引用,请使用两个不同的占位符名称并将相同的值绑定到两者。)

If you don't want to repeat the expression, you could move the condition from the WHERE clause to the HAVING, and refer to the expression in the SELECT list by the alias assigned to the column.如果不想重复表达式,可以将条件从 WHERE 子句移到 HAVING,并通过分配给列的别名引用 SELECT 列表中的表达式。

 SELECT t.r_id
      , t.val
      , levenshtein_ratio(:input, t.someval) AS lev_ratio
   FROM `table` t
 HAVING lev_ratio > 70

The big difference between WHERE and HAVING is that the predicates in the WHERE clause are evaluated when the rows are accessed. WHERE 和 HAVING 之间的最大区别在于,在访问行时会评估 WHERE 子句中的谓词。 The HAVING clause is evaluated much later, after the rows have been accessed. HAVING 子句在访问行之后很晚才进行评估。 (That's a brief explanation of why the HAVING clause can reference columns in the SELECT list by their alias, but the WHERE clause can't do that.) (这是为什么 HAVING 子句可以通过别名引用 SELECT 列表中的列,但 WHERE 子句不能这样做的简要说明。)

If that's a large table, and a large number of rows are being excluded, there might be a significant performance difference using the HAVING clause.. there may be a much larger intermediate set created.如果这是一个大表,并且排除了大量行,则使用 HAVING 子句可能会存在显着的性能差异......可能会创建一个更大的中间集。

To get an "index used" for the query, a covering index is the only option I see.要为查询获取“使用的索引”,覆盖索引是我看到的唯一选项。

 ON `table` (r_id, val, someval)

With that, MySQL can satisfy the query from the index, without needing to lookup pages in the underlying table.这样,MySQL 就可以满足从索引的查询,而无需在底层表中查找页面。 All of the column values the query needs are available from the index.查询所需的所有列值都可以从索引中获得。


FOLLOWUP跟进

To get an index created, we would need to create a column, eg要创建索引,我们需要创建一个列,例如

  lev_ratio_foo FLOAT

and pre-populate with the result from the function并用函数的结果预先填充

UPDATE `table` t
   SET t.lev_ratio_foo = levenshtein_ratio('foo', t.someval) 
;

Then we could create an index, eg然后我们可以创建一个索引,例如

... ON `table` (lev_ratio_foo, val, r_id)   

And re-write the query并重新编写查询

SELECT t.r_id
     , t.val
     , t.lev_ratio_foo 
  FROM `table` t
 WHERE t.lev_ratio_foo > 70

With that query, MySQL can make use of an index range scan operation on an index with lev_ratio_foo as the leading column.通过该查询,MySQL 可以对以 lev_ratio_foo 作为前导列的索引使用索引范围扫描操作。

Likely, we would want to add BEFORE INSERT and BEFORE UPDATE triggers to maintain the value, when a new row is added to the table, or the value of the someval column is modified.很可能,我们希望添加 BEFORE INSERT 和 BEFORE UPDATE 触发器来维护该值,当向表中添加新行或修改 someval 列的值时。

That pattern could be extended, additional columns could be added for values other than 'foo'.可以扩展该模式,可以为“foo”以外的值添加其他列。 eg 'bar'例如“酒吧”

UPDATE `table` t
   SET t.lev_ratio_bar = levenshtein_ratio('bar', t.someval)

Obviously that approach isn't going to be scalable for a broad range of input values.显然,这种方法对于大范围的输入值是不可扩展的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM