简体   繁体   中英

c# regular expression match at specific index in string?

I'd like to test if a regex will match part of a string at a specific index (and only starting at that specific index). For example, given the string "one two 3 4 five", I'd like to know that, at index 8, the regular expression [0-9]+ will match "3". RegularExpression.IsMatch and Match both take a starting index, however they both will search the entire rest of the string for a match if necessary.

string text="one two 3 4 five";
Regex num=new Regex("[0-9]+");

//unfortunately num.IsMatch(text,0) also finds a match and returns true
Console.WriteLine("{0} {1}",num.IsMatch(text, 8),num.IsMatch(text,0));

Obviously, I could check if the resulting match starts at the index I am interested in, but I will be doing this a large number of times on large strings, so I don't want to waste time searching for matches later on in the string. Also, I won't know in advance what regular expressions I will actually be testing against the string.

I don't want to:

  1. split the string on some boundary like whitespace because in my situation I won't know in advance what a suitable boundary would be
  2. have to modify the input string in any way (like getting the substring at index 8 and then using ^ in the regex)
  3. search the rest of the string for a match or do anything else that wouldn't be performant for a large number of tests against a large string.

I would like to parse a potentially large user supplied body of text using an arbitrary user supplied grammar. The grammar will be defined in a BNF or PEG like syntax, and the terminals will either be string literals or regular expressions. Thus I will need to check if the next part of the string matches any of the potential terminals as driven by the grammar.

How about using Regex.IsMatch(string, int) using a regular expression starting with \\G (meaning "start of last match")?

That appears to work:

using System;
using System.Text.RegularExpressions;

class Test
{
    static void Main()
    {
        string text="one two 3 4 five";
        Regex num=new Regex(@"\G[0-9]+");

        Console.WriteLine("{0} {1}",
                          num.IsMatch(text, 8), // True
                          num.IsMatch(text, 0)); // False
    }
}

如果只想搜索文本的子字符串,请在正则表达式之前获取该子字符串。

myRegex.Match(myString.Substring(8, 10));

I'm not sure I fully understand the question, but it seems to me that you can simply make the position part of the regular expression, eg

^.{8}[\d]

which will match if there are 8 characters between the start of the string and a digit.

If you know the max length of a potential match in the string you check for this would limit the scanning of the string.

If you're only checking for numbers this is probably easier than if you check for arbitrary expressions. The nature of Regex is to scan until the end in order to find a match. If you want to prevent scanning you need to include a length, or use something other than Regex.

string text = "one two 3 4 five";
Regex num = new Regex("[0-9]+");
int indexToCheck = 8;
int maxMatchLength = ...;
Match m = num.Match(text, indexToCheck, maxMatchLength);

Do you know anything about what types of expressions might be run against the strings, and will scanning the entire string be too much of an overhead?

num.Match will return the first hit if it exists, and then stop scanning. If you want more matches you would call m.NextMatch() to continue the scanning of matches.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM