简体   繁体   English

正则表达式从列表中查找完全匹配

[英]Regex Finding Exact Match from List

I had a quick question that I was hoping someone would be able to shed some light on for me. 我有一个简短的问题,我希望有人能够为我提供一些启示。

Still new to RegEx so this behavior doesn't make sense to me. 对RegEx来说还是很新的,所以这种行为对我来说没有意义。 I am using C# to write a simple function that searches for a list of substrings within a given string and that substrings position in the string. 我正在使用C#编写一个简单的函数,该函数搜索给定字符串中的子字符串列表以及该子字符串在字符串中的位置。 My code looks like this: 我的代码如下所示:

DataTable matchtable = new DataTable();
string searchstring = " Take a left in 2.1 miles.  Then take a right in 3 miles";
var substringlist = new [] {"2.1 miles", "3 miles", "4.1 miles", "1","take"};
string searchregexstr = string.(@"(\W|^){0}(\W|$)", string.Join("|", substringlist));
Regex searchregex = new Regex(searchregexstr);
if (searchregex.IsMatch(searchstring))
{
    foreach (Match substring in searchregex.Matches(searchstring))
    {
        string substringmatch = substring.toString();
        int indexofsubstringmatch = searchstring.IndexOf(substringmatch);
        matchtable.Rows.Add(susbtringmatch, indexofsubstringmatch);
    }
    return matchtable;
}
return matchtable;

With my main regex match function looking like this: 与我的主要正则表达式匹配函数看起来像这样:

string searchregexstr = string.(@"(\W|^){0}(\W|$)", string.Join("|", substringlist));

My issue is: 我的问题是:

When looking at my match table results I get a hit for both 2.1 miles and 1 (which is being matched within the 2.1) 查看比赛表结果时,我同时获得了2.1英里和1(在2.1范围内被匹配)的命中率

I assume (I think incorrectly) that my regex is looking for only complete matches where 1 should not match because it is not found by itself in the string. 我假设(我认为有误)我的正则表达式只寻找不应该匹配1的完全匹配项,因为它本身不是在字符串中找到的。

Does something stand out as missing? 某些东西会因为缺少而突出吗?

Thanks very much for any and all help in advance! 非常感谢您提前提供的所有帮助!

Zinga 辛加

Well, you can do this in many ways. 好吧,您可以通过多种方式执行此操作。 eg following code will return you list of indices of terms found in a particular string. 例如,以下代码将返回您在特定字符串中找到的术语索引列表。

public static IEnumerable<int> GetStringIndices(IEnumerable<string> substringlist, string data)
{
    var lstIndices = new List<int>();

    foreach (var searchString in substringlist)
    {
        var regexObj = new Regex($@"(?<=(\s|^)){searchString}(?=(\s|$)|(\W)+?)", 
            RegexOptions.IgnoreCase | RegexOptions.Multiline);

        var matchResults = regexObj.Match(data);

        if (!matchResults.Success)
        {
            lstIndices.Add(-1);
            continue;
        }

        while (matchResults.Success)
        {
            var idx = matchResults.Index;
            lstIndices.Add(idx);

            matchResults = matchResults.NextMatch();
        }
    }
    return lstIndices;
}

if I pass search string and terms you mentioned above 如果我通过您上面提到的搜索字符串和字词

var data = "Take a left in 2.1 miles.  Then take a right in 3 miles";
var substringlist = new[] { "2.1 miles", "3 miles", "4.1 miles", "1", "take" };
var indices = GetStringIndices(substringlist, data);

you'll get list of indices in variable named indices. 您将获得名为index的变量中的索引列表。 end result will be 最终结果将是

[15, 48, -1, 17, 0, 32] [15,48,-1,17,0,32]

2.1 miles is found at index 15 在索引15找到2.1英里

3.1 is at index 48 and so on. 3.1在索引48处,依此类推。

Your code has some errors, for example, string.( , toString , susbtringmatch , and a runtime error when adding a row to a DataTable with no columns. BTW, do you really need a DataTable ? 您的代码有一些错误,例如string.(toStringsusbtringmatch和在将行添加到无列的DataTable时出现运行时错误。顺便说一句,您真的需要DataTable吗?

Having corrected the typos and removed the DataTable , your code works fine for me if you correct this line like this: 更正了输入错误并删除了DataTable ,如果您像这样更正此行,您的代码对我来说就可以正常工作:

string searchregexstr = string.Format(@"(\W|^){0}(\W|$)", string.Join("|", substringlist));

The matches are: 匹配项是:

 2.1 miles (with leading space)
take  (with trailing space)
3 miles

Finally, you don't need the first return , as the final one will suffice. 最后,您不需要第一个return ,因为最后一个就足够了。

If you need help tuning your regular expression, I highly recommend RegExr . 如果您需要调整正则表达式的帮助,我强烈建议您使用RegExr

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM