简体   繁体   English

string.IndexOf 搜索全词匹配

[英]string.IndexOf search for whole word match

I am seeking a way to search a string for an exact match or whole word match.我正在寻找一种方法来搜索字符串以进行完全匹配或全字匹配。 RegEx.Match and RegEx.IsMatch don't seem to get me where I want to be. RegEx.MatchRegEx.IsMatch似乎没有让我到达我想去的地方。
Consider the following scenario:考虑以下场景:

namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
            int indx = str.IndexOf("TOTAL");
            string amount = str.Substring(indx + "TOTAL".Length, 10);
            string strAmount = Regex.Replace(amount, "[^.0-9]", "");

            Console.WriteLine(strAmount);
            Console.WriteLine("Press any key to continue...");
            Console.ReadKey();
        }
    }
}

The output of the above code is:上面代码的输出是:

// 34.37
// Press any key to continue...

The problem is, I don't want SUBTOTAL , but IndexOf finds the first occurrence of the word TOTAL which is in SUBTOTAL which then yields the incorrect value of 34.37.问题是,我不想要SUBTOTAL ,但IndexOf找到单词TOTAL的第一次出现,它在SUBTOTAL ,然后产生错误的 34.37 值。

So the question is, is there a way to force IndexOf to find only an exact match or is there another way to force that exact whole word match so that I can find the index of that exact match and then perform some useful function with it.所以问题是,是否有一种方法可以强制IndexOf仅查找完全匹配,或者是否有另一种方法可以强制完全匹配整个单词,以便我可以找到该完全匹配的索引,然后使用它执行一些有用的功能。 RegEx.IsMatch and RegEx.Match are, as far as I can tell, simply boolean searches.据我所知, RegEx.IsMatchRegEx.Match只是boolean搜索。 In this case, it isn't enough to just know the exact match exists.在这种情况下,仅仅知道存在精确匹配是不够的。 I need to know where it exists in the string.我需要知道它在字符串中的位置。

Any advice would be appreciated.任何意见,将不胜感激。

You can use Regex您可以使用正则表达式

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = Regex.Match(str, @"\WTOTAL\W").Index; // will be 18

My method is faster than the accepted answer because it does not use Regex.我的方法比接受的答案更快,因为它不使用正则表达式。

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = str.IndexOfWholeWord("TOTAL");

public static int IndexOfWholeWord(this string str, string word)
{
    for (int j = 0; j < str.Length && 
        (j = str.IndexOf(word, j, StringComparison.Ordinal)) >= 0; j++)
        if ((j == 0 || !char.IsLetterOrDigit(str, j - 1)) && 
            (j + word.Length == str.Length || !char.IsLetterOrDigit(str, j + word.Length)))
            return j;
    return -1;
}

You can use word boundaries , \\b , and the Match.Index property :您可以使用单词边界\\bMatch.Index属性

var text = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var idx = Regex.Match(text, @"\bTOTAL\b").Index;
// => 19

See the C# demo .请参阅C# 演示

The \\bTOTAL\\b matches TOTAL when it is not enclosed with any other letters, digits or underscores .\\bTOTAL\\b没有用任何其他字母、数字或下划线括起来时,它匹配TOTAL

If you need to count a word as a whole word if it is enclosed with underscores, use如果您需要将一个单词作为一个完整的单词来计算(如果它用下划线括起来),请使用

var idx = Regex.Match(text, @"(?<![^\W_])TOTAL(?![^\W_])").Index;

where (?<![^\\W_]) is a negative lookbehind that fails the match if there is a character other than a non-word and underscore immediately to the left of the current location (so, there can be a start of string position, or a char that is a not a digit nor letter), and (?![^\\W_]) is a similar negative lookahead that only matches if there is an end of string position or a char other than a letter or digit immediately to the right of the current location.其中(?<![^\\W_])是负向后视,如果存在非单词以外的字符并立即在当前位置的左侧下划线,则匹配失败(因此,可以有字符串的开头位置,或一个既不是数字也不是字母的字符),并且(?![^\\W_])是一个类似的负前瞻,仅当存在字符串位置的结尾或字符而不是字母或数字时才匹配就在当前位置的右侧。

If the boundaries are whitespaces or start/end of string use如果边界是空格或字符串的开始/结束使用

var idx = Regex.Match(text, @"(?<!\S)TOTAL(?!\S)").Index;

where (?<!\\S) requires start of string or a whitespace immediately on the left, and (?!\\S) requires the end of string or a whitespace on the right.其中(?<!\\S)要求紧靠左边的字符串开头或空格,而(?!\\S)要求字符串结尾或右边的空格。

NOTE : \\b , (?<!...) and (?!...) are non-consuming patterns , that is the regex index does not advance when matching these patterns, thus, you get the exact positions of the word you search for.注意\\b , (?<!...)(?!...) 是非消耗模式,即匹配这些模式时正则表达式索引不会前进,因此,您可以获得单词的确切位置你搜索。

To make the accepted answer a little bit safer (since IndexOf returns -1 for unmatched):为了使接受的答案更安全(因为IndexOf返回 -1 表示不匹配):

string pattern = String.Format(@"\b{0}\b", findTxt);
Match mtc = Regex.Match(queryTxt, pattern);
if (mtc.Success)
{
    return mtc.Index;
}
else
    return -1;

While this may be a hack that just works for only your example, try虽然这可能是仅适用于您的示例的黑客,但请尝试

string amount = str.Substring(indx + " TOTAL".Length, 10);

giving an extra space before total.在总数之前给一个额外的空间。 As this will not occur with SUBTOTAL , it should skip over the word you don't want and just look for an isolated TOTAL .由于SUBTOTAL不会发生这种情况,因此它应该跳过您不想要的词,只查找一个孤立的TOTAL

I'd recommend the Regex solution from LB too, but if you can't use Regex, then you could use String.LastIndexOf("TOTAL").我也推荐 LB 的 Regex 解决方案,但如果您不能使用 Regex,那么您可以使用 String.LastIndexOf("TOTAL")。 Assuming the TOTAL always comes after SUBTOTAL?假设 TOTAL 总是在 SUBTOTAL 之后?

http://msdn.microsoft.com/en-us/library/system.string.lastindexof(v=vs.110).aspx http://msdn.microsoft.com/en-us/library/system.string.lastindexof(v=vs.110).aspx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM