简体   繁体   English

strpos() 多针?

[英]strpos() with multiple needles?

I am looking for a function like strpos() with two significant differences:我正在寻找一个像 strpos() 这样的函数,它有两个显着差异:

  1. To be able to accept multiple needles.能够接受多针。 I mean thousands of needles at ones.我的意思是数以千计的针。
  2. To search for all occurrences of the needles in the haystack and to return an array of starting positions.搜索大海捞针中所有出现的针并返回一组起始位置。

Of course it has to be an efficient solution not just a loop through every needle.当然,它必须是一个有效的解决方案,而不仅仅是通过每根针的循环。 I have searched through this forum and there were similar questions to this one, like:我在这个论坛上搜索过,有类似的问题,比如:

but nether of them was what I am looking for.但他们中的下一个是我正在寻找的。 I am using strpos just to illustrate my question better, probably something entirely different has to be used for this purpose.我使用 strpos 只是为了更好地说明我的问题,可能为此目的必须使用完全不同的东西。

I am aware of Zend_Search_Lucene and I am interested if it can be used to achieve this and how (just the general idea)?我知道Zend_Search_Lucene并且我很感兴趣它是否可以用来实现这一点以及如何(只是一般的想法)?

Thanks a lot for Your help and time!非常感谢您的帮助和时间!

try preg match for multiple尝试预匹配多个

if (preg_match('/word|word2/i', $str))

Checking for multiple strpos values 检查多个 strpos 值

Here's some sample code for my strategy:这是我的策略的一些示例代码:

function strpos_array($haystack, $needles, $offset=0) {
    $matches = array();

    //Avoid the obvious: when haystack or needles are empty, return no matches
    if(empty($needles) || empty($haystack)) {
        return $matches;
    }

    $haystack = (string)$haystack; //Pre-cast non-string haystacks
    $haylen = strlen($haystack);

    //Allow negative (from end of haystack) offsets
    if($offset < 0) {
        $offset += $heylen;
    }

    //Use strpos if there is no array or only one needle
    if(!is_array($needles)) {
        $needles = array($needles);
    }

    $needles = array_unique($needles); //Not necessary if you are sure all needles are unique

    //Precalculate needle lengths to save time
    foreach($needles as &$origNeedle) {
        $origNeedle = array((string)$origNeedle, strlen($origNeedle));
    }

    //Find matches
    for(; $offset < $haylen; $offset++) {
        foreach($needles as $needle) {
            list($needle, $length) = $needle;
            if($needle == substr($haystack, $offset, $length)) {
                $matches[] = $offset;
                break;
            }
        }
    }

    return($matches);
}

I've implemented a simple brute force method above that will work with any combination of needles and haystacks (not just words).我在上面实现了一个简单的蛮力方法,它可以与针和干草堆(不仅仅是单词)的任何组合一起使用。 For possibly faster algorithms check out:对于可能更快的算法,请查看:


Other Solution其他解决方案

function strpos_array($haystack, $needles, $theOffset=0) {
    $matches = array();

    if(empty($haystack) || empty($needles)) {
        return $matches;
    }

    $haylen = strlen($haystack);

    if($theOffset < 0) {  // Support negative offsets
        $theOffest += $haylen;
    }

    foreach($needles as $needle) {
        $needlelen = strlen($needle);
        $offset = $theOffset;

        while(($match = strpos($haystack, $needle, $offset)) !== false) {
            $matches[] = $match;
            $offset = $match + $needlelen;
            if($offset >= $haylen) {
                break;
            }
        }
    }

    return $matches;
}

I know this doesn't answer the OP's question but wanted to comment since this page is at the top of Google for strpos with multiple needles.我知道这不能回答 OP 的问题,但想发表评论,因为此页面位于 Google 的顶部,用于多针的 strpos。 Here's a simple solution to do so (again, this isn't specific to the OP's question - sorry):这是一个简单的解决方案(同样,这不是特定于 OP 的问题 - 抱歉):

    $img_formats = array('.jpg','.png');
    $missing = array();
    foreach ( $img_formats as $format )
        if ( stripos($post['timer_background_image'], $format) === false ) $missing[] = $format;
    if (count($missing) == 2)
        return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));

If 2 items are added to the $missing array that means that the input doesn't satisfy any of the image formats in the $img_formats array.如果将 2 个项目添加到 $missing 数组,则意味着输入不满足 $img_formats 数组中的任何图像格式。 At that point you know that you can return an error, etc. This could easily be turned into a little function:那时你知道你可以返回一个错误等。这可以很容易地变成一个小函数:

    function m_stripos( $haystack = null, $needles = array() ){
        //return early if missing arguments 
        if ( !$needles || !$haystack ) return false; 
        // create an array to evaluate at the end
        $missing = array(); 
        //Loop through needles array, and add to $missing array if not satisfied
        foreach ( $needles as $needle )
            if ( stripos($haystack, $needle) === false ) $missing[] = $needle;
        //If the count of $missing and $needles is equal, we know there were no matches, return false..
        if (count($missing) == count($needles)) return false; 
        //If we're here, be happy, return true...
        return true;
    }

Back to our first example using then the function instead:回到我们第一个使用 then 函数的例子:

    $needles = array('.jpg','.png');
    if ( !m_strpos( $post['timer_background_image'], $needles ) )
        return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));

Of course, what you do after the function returns true or false is up to you.当然,在函数返回 true 或 false 之后,您要做什么取决于您。

It seems you are searching for whole words .看来您正在搜索整个单词 In this case, something like this might help.在这种情况下,这样的事情可能会有所帮助。 As it uses built-in functions, it should be faster than custom code, but you have to profile it:由于它使用内置函数,它应该比自定义代码更快,但您必须对其进行分析:

$words = str_word_count($str, 2);

$word_position_map = array();

foreach($words as $position => $word) {
    if(!isset($word_position_map[$word])) {
        $word_position_map[$word] = array();
    }
    $word_position_map[$word][] = $position;
}

// assuming $needles is an array of words
$result = array_intersect_key($word_position_map, array_flip($needles));

Storing the information (like the needles) in the right format will improve the runtime ( eg as you don't have to call array_flip ).以正确的格式存储信息(如针)将改善运行时间(例如,因为您不必调用array_flip )。

Note from the str_word_count documentation:请注意str_word_count文档:

For the purpose of this function, 'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start with "'" and "-" characters.出于此功能的目的,'word' 被定义为包含字母字符的区域设置相关字符串,该字符串也可能包含但不以“'”和“-”字符开头。

So make sure you set the locale right.因此,请确保正确设置语言环境。

How about a simple solution using array_map() ?使用array_map()的简单解决方案怎么样?

$string = 'one two three four';
$needles = array( 'five' , 'three' );
$strpos_arr = array_map( function ( $check ) use ( $string ) {
    return strpos( $string, $check );
}, $needles );

As return, you're going to have an array where the keys are the needles positions and the values are the starting positions, if found.作为回报,您将拥有一个数组,其中键是针位置,值是起始位置(如果找到)。

//print_r( $strpos_arr );
Array
(
    [0] => 
    [1] => 8
)

You could use a regular expression, they support OR operations.您可以使用正则表达式,它们支持 OR 操作。 This would however make it fairly slow, compared to strpos.然而,与 strpos 相比,这会使其相当慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM