简体   繁体   English

C#正则表达式匹配在字符串中的特定索引?

[英]c# regular expression match at specific index in string?

I'd like to test if a regex will match part of a string at a specific index (and only starting at that specific index). 我想测试一个正则表达式是否将在特定索引处匹配字符串的一部分(并且仅从该特定索引开始)。 For example, given the string "one two 3 4 five", I'd like to know that, at index 8, the regular expression [0-9]+ will match "3". 例如,给定字符串“一二三四五”,我想知道在索引8处,正则表达式[0-9] +将匹配“ 3”。 RegularExpression.IsMatch and Match both take a starting index, however they both will search the entire rest of the string for a match if necessary. RegularExpression.IsMatch和Match都采用起始索引,但是如果需要,它们都将在字符串的其余部分搜索匹配项。

string text="one two 3 4 five";
Regex num=new Regex("[0-9]+");

//unfortunately num.IsMatch(text,0) also finds a match and returns true
Console.WriteLine("{0} {1}",num.IsMatch(text, 8),num.IsMatch(text,0));

Obviously, I could check if the resulting match starts at the index I am interested in, but I will be doing this a large number of times on large strings, so I don't want to waste time searching for matches later on in the string. 显然,我可以检查结果匹配是否从我感兴趣的索引处开始,但是我将在大型字符串上进行大量的此操作,因此我不想浪费时间在字符串的后面搜索匹配项。 Also, I won't know in advance what regular expressions I will actually be testing against the string. 另外,我不会事先知道我将实际针对字符串测试哪些正则表达式。

I don't want to: 我不想:

  1. split the string on some boundary like whitespace because in my situation I won't know in advance what a suitable boundary would be 在某些边界(例如空格)上分割字符串,因为在我的情况下,我无法预先知道合适的边界是什么
  2. have to modify the input string in any way (like getting the substring at index 8 and then using ^ in the regex) 必须以任何方式修改输入字符串(例如在索引8处获取子字符串,然后在正则表达式中使用^)
  3. search the rest of the string for a match or do anything else that wouldn't be performant for a large number of tests against a large string. 搜索字符串的其余部分以查找匹配项,或者对大型字符串执行大量测试无法执行的其他任何操作。

I would like to parse a potentially large user supplied body of text using an arbitrary user supplied grammar. 我想使用任意用户提供的语法来分析可能由用户提供的大量文本。 The grammar will be defined in a BNF or PEG like syntax, and the terminals will either be string literals or regular expressions. 语法将以类似于BNF或PEG的语法定义,并且终端将为字符串文字或正则表达式。 Thus I will need to check if the next part of the string matches any of the potential terminals as driven by the grammar. 因此,我将需要检查字符串的下一部分是否与语法所驱动的任何潜在端子相匹配。

How about using Regex.IsMatch(string, int) using a regular expression starting with \\G (meaning "start of last match")? 如何使用以\\G开头的正则表达式来使用Regex.IsMatch(string, int) (意味着“最后一场比赛的开始”)?

That appears to work: 看来可行:

using System;
using System.Text.RegularExpressions;

class Test
{
    static void Main()
    {
        string text="one two 3 4 five";
        Regex num=new Regex(@"\G[0-9]+");

        Console.WriteLine("{0} {1}",
                          num.IsMatch(text, 8), // True
                          num.IsMatch(text, 0)); // False
    }
}

如果只想搜索文本的子字符串,请在正则表达式之前获取该子字符串。

myRegex.Match(myString.Substring(8, 10));

I'm not sure I fully understand the question, but it seems to me that you can simply make the position part of the regular expression, eg 我不确定我是否完全理解这个问题,但是在我看来,您可以简单地将位置设为正则表达式的一部分,例如

^.{8}[\d]

which will match if there are 8 characters between the start of the string and a digit. 如果字符串的开头和数字之间有8个字符,则将匹配。

If you know the max length of a potential match in the string you check for this would limit the scanning of the string. 如果您知道字符串中潜在匹配项的最大长度,请检查该长度是否会限制字符串的扫描。

If you're only checking for numbers this is probably easier than if you check for arbitrary expressions. 如果只检查数字,则比检查任意表达式要容易。 The nature of Regex is to scan until the end in order to find a match. 正则表达式的本质是扫描到最后以找到匹配项。 If you want to prevent scanning you need to include a length, or use something other than Regex. 如果要防止扫描,则需要包含长度,或者使用Regex以外的其他内容。

string text = "one two 3 4 five";
Regex num = new Regex("[0-9]+");
int indexToCheck = 8;
int maxMatchLength = ...;
Match m = num.Match(text, indexToCheck, maxMatchLength);

Do you know anything about what types of expressions might be run against the strings, and will scanning the entire string be too much of an overhead? 您是否知道关于字符串可以运行哪种类型的表达式的信息,并且扫描整个字符串会产生过多的开销吗?

num.Match will return the first hit if it exists, and then stop scanning. 如果存在,num.Match将返回第一个匹配项,然后停止扫描。 If you want more matches you would call m.NextMatch() to continue the scanning of matches. 如果需要更多匹配项,则可以调用m.NextMatch()继续扫描匹配项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM