简体   繁体   中英

string.IndexOf search for whole word match

I am seeking a way to search a string for an exact match or whole word match. RegEx.Match and RegEx.IsMatch don't seem to get me where I want to be.
Consider the following scenario:

namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
            int indx = str.IndexOf("TOTAL");
            string amount = str.Substring(indx + "TOTAL".Length, 10);
            string strAmount = Regex.Replace(amount, "[^.0-9]", "");

            Console.WriteLine(strAmount);
            Console.WriteLine("Press any key to continue...");
            Console.ReadKey();
        }
    }
}

The output of the above code is:

// 34.37
// Press any key to continue...

The problem is, I don't want SUBTOTAL , but IndexOf finds the first occurrence of the word TOTAL which is in SUBTOTAL which then yields the incorrect value of 34.37.

So the question is, is there a way to force IndexOf to find only an exact match or is there another way to force that exact whole word match so that I can find the index of that exact match and then perform some useful function with it. RegEx.IsMatch and RegEx.Match are, as far as I can tell, simply boolean searches. In this case, it isn't enough to just know the exact match exists. I need to know where it exists in the string.

Any advice would be appreciated.

You can use Regex

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = Regex.Match(str, @"\WTOTAL\W").Index; // will be 18

My method is faster than the accepted answer because it does not use Regex.

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = str.IndexOfWholeWord("TOTAL");

public static int IndexOfWholeWord(this string str, string word)
{
    for (int j = 0; j < str.Length && 
        (j = str.IndexOf(word, j, StringComparison.Ordinal)) >= 0; j++)
        if ((j == 0 || !char.IsLetterOrDigit(str, j - 1)) && 
            (j + word.Length == str.Length || !char.IsLetterOrDigit(str, j + word.Length)))
            return j;
    return -1;
}

You can use word boundaries , \\b , and the Match.Index property :

var text = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var idx = Regex.Match(text, @"\bTOTAL\b").Index;
// => 19

See the C# demo .

The \\bTOTAL\\b matches TOTAL when it is not enclosed with any other letters, digits or underscores .

If you need to count a word as a whole word if it is enclosed with underscores, use

var idx = Regex.Match(text, @"(?<![^\W_])TOTAL(?![^\W_])").Index;

where (?<![^\\W_]) is a negative lookbehind that fails the match if there is a character other than a non-word and underscore immediately to the left of the current location (so, there can be a start of string position, or a char that is a not a digit nor letter), and (?![^\\W_]) is a similar negative lookahead that only matches if there is an end of string position or a char other than a letter or digit immediately to the right of the current location.

If the boundaries are whitespaces or start/end of string use

var idx = Regex.Match(text, @"(?<!\S)TOTAL(?!\S)").Index;

where (?<!\\S) requires start of string or a whitespace immediately on the left, and (?!\\S) requires the end of string or a whitespace on the right.

NOTE : \\b , (?<!...) and (?!...) are non-consuming patterns , that is the regex index does not advance when matching these patterns, thus, you get the exact positions of the word you search for.

To make the accepted answer a little bit safer (since IndexOf returns -1 for unmatched):

string pattern = String.Format(@"\b{0}\b", findTxt);
Match mtc = Regex.Match(queryTxt, pattern);
if (mtc.Success)
{
    return mtc.Index;
}
else
    return -1;

While this may be a hack that just works for only your example, try

string amount = str.Substring(indx + " TOTAL".Length, 10);

giving an extra space before total. As this will not occur with SUBTOTAL , it should skip over the word you don't want and just look for an isolated TOTAL .

I'd recommend the Regex solution from LB too, but if you can't use Regex, then you could use String.LastIndexOf("TOTAL"). Assuming the TOTAL always comes after SUBTOTAL?

http://msdn.microsoft.com/en-us/library/system.string.lastindexof(v=vs.110).aspx

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM