简体   繁体   English

字符串是否包含 PHP 中的任何子字符串列表?

[英]Does a string contain any of a list of substrings in PHP?

I am adding a feature to an application that allows authorised oil rig personnel to submit weather reports (for use by our pilots when planning flights) to our system via email.我正在向应用程序添加一项功能,该功能允许授权的石油钻井平台人员通过 email 向我们的系统提交天气报告(供我们的飞行员在计划飞行时使用)。 The tricky part is that we want to match these reports to a particular oil platform, but the personnel (and their email accounts) can move between rigs.棘手的部分是我们希望将这些报告与特定的石油平台相匹配,但人员(及其 email 帐户)可以在钻井平台之间移动。

We already have a list of waypoints that each have an "aliases" field.我们已经有一个航点列表,每个航点都有一个“别名”字段。 Basically if the email subject contains something in the aliases field, we should match the email to that waypoint.基本上,如果 email 主题在别名字段中包含某些内容,我们应该将 email 与该航路点匹配。

The subject could be "Weather report 10 April @ 1100 Rig A for you as requested"主题可以是“根据要求为您提供 4 月 10 日 @ 1100 Rig A 的天气报告”

The aliases for that waypoint would be something like "RRA RPA Rig A RigA"该航路点的别名类似于“RRA RPA Rig A RigA”

Keep in mind there is a similar list of aliases for all the other waypoints we have.请记住,我们拥有的所有其他航点都有一个类似的别名列表。

Is there a better way of matching than iterating through each word of each alias and checking if it's a substring of the email subject?有没有比遍历每个别名的每个单词并检查它是否是 email 主题的 substring 更好的匹配方法? Because that sounds like an^2 sort of problem.因为这听起来像是一个 ^2 类型的问题。

The alternative is for us to put a restriction and tell the operators they have to put the rig name at the start or end of the subject.另一种方法是我们设置限制并告诉操作员他们必须将钻机名称放在主题的开头或结尾。

This sounds more like an algorithms question than a PHP question specifically.这听起来更像是一个算法问题,而不是 PHP 问题。 Take a look at What is the fastest substring search algorithm?看看最快的substring搜索算法是什么?

Well you can transform this into something like an O(n log n) algorithm, but it depends on the implementation specifics of stripos() :好吧,您可以将其转换为类似于 O(n log n) 算法的东西,但这取决于stripos()的实现细节:

define('RIG_ID_1', 123);
define('RIG_ID_2', 456);

function get_rig_id($email_subject) {
    $alias_map = [
        'RRA' => RIG_ID_1,
        'RPA' => RIG_ID_1,
        'Rig A' => RIG_ID_1,
        'RigA' => RIG_ID_1,
        // ...
    ];
    foreach(array_keys($alias_map) as $rig_substr) {
        if(stripos($email_subject, $rig_substr) !== false) {
            return $alias_map[$rig_substr];
        }
    }
    return null;
}

Here each substring is examined by stripos() exactly once.在这里,每个 substring 只由stripos()检查一次。 Probably a better solution is to compose these strings into a series of regexes.可能更好的解决方案是将这些字符串组合成一系列正则表达式。 Internally, the regex engine is able to scan text very efficiently, typically scanning each character only one time:在内部,正则表达式引擎能够非常有效地扫描文本,通常只扫描每个字符一次:

ex.:前任。:

<?php

define('RIG_ID_1', 123);
define('RIG_ID_2', 456);

function get_rig_id($email_subject) {
    $alias_map = [
        '/RRA|RPA|Rig\\sA|RigA/i' => RIG_ID_1,
        '/RRB|RPB|Rig\\sB|RigB/i' => RIG_ID_2,
        // ...
    ];
    foreach(array_keys($alias_map) as $rig_regex) {
        if(preg_match($rig_regex, $email_subject)) {
            return $alias_map[$rig_regex];
        }
    }
    return null;
}

For your purposes the practical solution is very much dependent upon how many rigs you've got an how many substrings per rig.出于您的目的,实际的解决方案在很大程度上取决于您有多少台钻机以及每个钻机有多少子串。 I suspect that unless you're dealing with tens of thousands of rigs or unless performance is a critical aspect of this application, a naive O(n^2) solution would probably suffice.我怀疑除非您要处理数以万计的钻机,或者除非性能是该应用程序的一个关键方面,否则一个简单的 O(n^2) 解决方案可能就足够了。 (Remember that premature optimization is the root of all evil.) A simple benchmark would bear this out. (请记住,过早的优化是万恶之源。)一个简单的基准就可以证明这一点。

An even-better solution -- and potentially faster -- would be to set up an elasticsearch instance, but once again that may be too much effort to go to when a naive approach would suffice in a fraction of the implementation time.一个更好的解决方案 - 并且可能更快 - 是设置 elasticsearch 实例,但对于 go 来说,这可能是太多的努力,因为当一个简单的方法在一小部分实现时间内就足够了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM