简体   繁体   English

在PHP的单个字符串中搜索不同字符串的数组

[英]Searching an array of different strings inside a single string in PHP

I have an array of strings that I want to try and match to the end of a normal string. 我有一个字符串数组,我想尝试将其匹配到普通字符串的末尾。 I'm not sure the best way to do this in PHP. 我不确定在PHP中执行此操作的最佳方法。

This is sorta what I am trying to do: 这是我正在尝试做的事情:

Example: 例:

Input: abcde 输入:abcde

Search array: er, wr, de 搜索数组:er,wr,de

Match: de 匹配:de

My first thought was to write a loop that goes through the array and crafts a regular expression by adding "\\b" on the end of each string and then check if it is found in the input string. 我的第一个想法是编写一个遍历数组的循环,并通过在每个字符串的末尾添加“ \\ b”来编写正则表达式,然后检查是否在输入字符串中找到了该循环。 While this would work it seems sorta inefficient to loop through the entire array. 尽管这可行,但遍历整个数组似乎有点效率低下。 I've been told regular expressions are slow in PHP and don't want to implement something that will take me down the wrong path. 有人告诉我,正则表达式在PHP中运行缓慢,并且不想实现会使我走错路的方法。

Is there a better way to see if one of the strings in my array occurs at the end of the input string? 有没有更好的方法来查看数组中的字符串之一是否出现在输入字符串的末尾?

The preg_filter() function looks like it might do the job but is for PHP 5.3+ and I am still sticking with 5.2.11 stable. preg_filter()函数似乎可以完成此工作,但适用于PHP 5.3+,我仍然坚持使用5.2.11稳定版。

For something this simple, you don't need a regex. 对于这么简单的事情,您不需要正则表达式。 You can either loop over the array, and use strpos to see if the index is length(input) - length(test). 您可以在数组上循环,然后使用strpos查看索引是否为length(input)-length(test)。 If each entry in the search array is always of a constant length, you can also speed things up by chopping the end off the input, then comparing that to each item in the array. 如果搜索数组中的每个条目始终具有恒定的长度,则还可以通过将输入端切掉,然后将其与数组中的每个条目进行比较来加快处理速度。

You can't avoid going through the whole array, as in the worst general case, the item that matches will be at the end of the array. 您无法避免遍历整个数组,因为在最坏的一般情况下,匹配的项将位于数组的末尾。 However, unless the array is huge, I wouldn't worry too much about performance - it will be much faster than you think. 但是,除非数组很大,否则我不会太担心性能-它会比您想象的要快得多。

Though compiling the regular expression takes some time I wouldn't dismiss using pcre so easily. 尽管编译正则表达式需要花费一些时间,但我不会轻易放弃使用pcre。 Unless you find a compare function that takes several needles you need a loop for the needles and executing the loop + calling the compare function for each single needle takes time, too. 除非找到需要几个针的比较功能,否则您需要为针循环,并且执行循环+为每个针调用比较功能也需要时间。

Let's take a test script that fetches all the function names from php.net and looks for certain endings. 让我们以一个测试脚本为例,该脚本从php.net获取所有函数名称并查找某些结尾。 This was only an adhoc script but I suppose no matter which strcmp-ish function + loop you use it will be slower than the simple pcre pattern (in this case). 这只是一个即席脚本,但是我想无论您使用哪种strcmp-ish函数+循环,它都比简单的pcre模式(在这种情况下)要慢。

count($hs)=5549
pcre: 4.377925157547 s
substr_compare: 7.951938867569 s
identical results: bool(true)

This was the result when search for nine different patterns. 这是搜索九种不同模式时的结果。 If there were only two ('yadda', 'ge') both methods took the same time. 如果只有两个(“ yadda”,“ ge”),则两种方法花费的时间相同。

Feel free to criticize the test script (aren't there always errors in synthetic tests that are obvious for everyone but oneself? ;-) ) 随意批评测试脚本(综合测试中是否总会出现错误,除了自己以外,其他人都显而易见?;-))

<?php
/* get the test data
All the function names from php.net
*/
$doc = new DOMDocument;
$doc->loadhtmlfile('http://docs.php.net/quickref.php');
$xpath = new DOMXPath($doc);
$hs = array();
foreach( $xpath->query('//a') as $a ) {
  $hs[] = $a->textContent;
}
echo 'count($hs)=', count($hs), "\n";
// should find:
// ge, e.g. imagick_adaptiveblurimage
// ing, e.g. m_setblocking
// name, e.g. basename 
// ions, e.g. assert_options
$ns = array('yadda', 'ge', 'foo', 'ing', 'bar', 'name', 'abcd', 'ions', 'baz');
sleep(1);

/* test 1: pcre */
$start = microtime(true);
for($run=0; $run<100; $run++) {
  $matchesA = array();
  $pattern = '/(?:' . join('|', $ns) . ')$/';
  foreach($hs as $haystack) {
    if ( preg_match($pattern, $haystack, $m) ) {
      @$matchesA[$m[0]]+= 1;
    }
  }
}
echo "pcre: ", microtime(true)-$start, " s\n";
flush();
sleep(1);

/* test 2: loop + substr_compare */
$start = microtime(true);
for($run=0; $run<100; $run++) {
  $matchesB = array();
  foreach( $hs as $haystack ) {
    $hlen = strlen($haystack);
    foreach( $ns as $needle ) {
      $nlen = strlen($needle);
      if ( $hlen >= $nlen && 0===substr_compare($haystack, $needle, -$nlen) ) {
        @$matchesB[$needle]+= 1;
      }
    }
  }
}
echo "substr_compare: ", microtime(true)-$start, " s\n";
echo 'identical results: '; var_dump($matchesA===$matchesB);

I might approach this backwards; 我可能会倒退。

if your string-ending list is fixed or varies rarely, I would start by preprocessing it to make it easy to match against, then grab the end of your string and see if it matches! 如果您的字符串结尾列表是固定的或很少变化,那么我将首先对其进行预处理以使其易于匹配,然后抓住字符串的末尾,看看它是否匹配!

Sample code: 样例代码:

<?php

// Test whether string ends in predetermined list of suffixes
// Input: string to test
// Output: if matching suffix found, returns suffix as string, else boolean false
function findMatch($str) {
    $matchTo = array(
        2 => array( 'ge' => true, 'de' => true ),
        3 => array( 'foo' => true, 'bar' => true, 'baz' => true ),
        4 => array( 'abcd' => true, 'efgh' => true )
    );

    foreach($matchTo as $length => $list) {
        $end = substr($str, -$length);

        if (isset($list[$end]))
            return $end;
    }

    return $false;
}

?>

This might be an overkill but you can try the following. 这可能是一个矫kill过正,但您可以尝试以下方法。 Create a hash for each entry of your search array and store them as keys in the array (that will be your lookup array). 为搜索数组的每个条目创建一个哈希,并将它们作为键存储在数组中(这将是您的查找数组)。

Then go from the end of your input string one character at time (e, de,cde and etc) and compute a hash on a substring at each iteration. 然后从输入字符串的末尾开始,每次输入一个字符(e,de,cde等),并在每次迭代时计算子字符串的哈希值。 If a hash is in your lookup array, you have much. 如果您的查找数组中有哈希,那么您将拥有很多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM