简体   繁体   English

正则表达式子字符串C#

[英]regex substring C#

I need help to figure the regex expression 我需要帮助来计算正则表达式

I have 我有

string = "STATE changed from [Fixed] to [Closed], CLOSED DATE added [Fri Jan 14 09:32:19 
MST 2011], NOTES changed from [CLOSED[]<br />] to [TEST CLOSED <br />]"

I need to grab NOTES changed from [CLOSED[]<br />] to [TEST CLOSED <br />] and take values CLOSED[] and TEST CLOSED in two string variables. 我需要获取NOTES changed from [CLOSED[]<br />] to [TEST CLOSED <br />]并在两个字符串变量中获取值CLOSED[]TEST CLOSED
So far I got to: 到目前为止,我必须:

Regex NotesChanged = new Regex(@"NOTES changed from \[(\w*|\W*)\] to \[([\w-|\W-]*)\]");

which matches only if "NOTES changed from" started at the beginning and has no '[]' within '[ ]', but I have "[CLOSED[]]" and also no " 仅当“ NOTES from from”从头开始并且在“ []”中没有“ []”但我有“ [CLOSED []]”并且也没有“
". Any ideas on what to change in regex. “。关于正则表达式更改内容的任何想法。

Thanks, Sharma 谢谢,夏尔马

This is kind of wierd... 这有点奇怪...

(\w*|\W*)

That a capturing group of all word characters zero or many times or all non word characters zero or many times 捕获组中所有单词字符为零或多次或所有非单词字符为零或多次

What you wanna do if you have matching braces is to create a pattern which doesn't consume the delimiter. 如果您有匹配的花括号,您想做的是创建一个不占用定界符的模式。

\[([^\]]+)\]

That will match any occurrence of [with some text in it] where the matched text is the first group in the match. 这将匹配出现的[with some text in it] ,其中匹配的文本是匹配的第一组。

Since you have the same type of delimiters nested with in the string itself it gets a bit more tricker and you need to use "look-a-head" or some sort of alteration. 由于您在字符串本身中嵌套了相同类型的定界符,因此变得更加棘手,您需要使用“先行查找”或某种形式的更改。

((?:[^\[\]]|\[\])*)

This can be future improved, but there's a problem here that can not be solved if you have [[[]]] . 这可以在将来得到改善,但是如果您有[[[]]]那么这里就有一个无法解决的问题。 You cannot create a recursive regular expression. 您不能创建递归正则表达式。 It is not that flexible. 它不是那么灵活。 So you need to either hard code a max depth or apply the regular expression several times. 因此,您需要对最大深度进行硬编码或多次应用正则表达式。

A fairly exhaustive way of doing this would be 一个相当详尽的方法是

\[((?:[^\[\]]*)(?:(?=\[)(?:[^\]]*)\])?([^\]]))\]

If "<br />" is going to be there every time, you can use one of my favourite patterns (and it's worth memorizing). 如果每次都会出现“ <br />”,则可以使用我最喜欢的模式之一(值得记住)。 The pattern is: 模式是:

delim[^delim]*delim

The pattern above will match a delimiter, followed by anything except the delimiter as many times as possible, then the delimiter again. 上面的模式将匹配一个定界符,除定界符之外的其他所有字符的匹配次数应尽可能多,然后再重新定界符。

Here is the regular expression I would be tempted to use: 这是我很想使用的正则表达式:

NOTES changed from \[([^<]*)[^\]]*\] to \[([^<]*)[^\]]*\]

In English: 用英语:

  • Grabs the opening [ 抢开[
  • Capture #1 all characters until the < (assuming the br tag is always there) 捕获所有#1字符,直到<(假设br标签始终存在)
  • Reads until the closing ] 读到结束]
  • Repeat for second capture zone 重复第二个捕获区域

尝试在括号组中的捕获序列中添加“ \\[|\\] ”。

Regex NotesChanged = new Regex(@"NOTES changed from \[(\w*|\W*|\[|\])\] to \[([\w-|\W-|\[|\]]*)\]");

I believe you can use balancing group definitions to match the nested brackets. 我相信您可以使用平衡组定义来匹配嵌套括号。 I believe these are .NET specific, at least in that particular implementation flavor. 我相信这些都是.NET特定的,至少在特定的实现方式中。 There's an example on that page, which I've adapted to your input here: 该页面上有一个示例,在这里我已经适应了您的输入:

class Program {
    static void Main (string[] args) {
        var input = "STATE changed from [Fixed] to [Closed], CLOSED DATE added [Fri Jan 14 09:32:19 MST 2011], NOTES changed from [CLOSED[]] to [TEST CLOSED ]";
        var regex = new Regex(@"NOTES changed from (((?'open'\[)[^\[\]]*)+((?'close-open'\])[^\[\]]*)+)*");

        foreach (var match in regex.Matches(input)) {
            Console.WriteLine(match);
        }
    }
}

This prints NOTES changed from [CLOSED[]] to [TEST CLOSED ] for me. 这将为我打印NOTES changed from [CLOSED[]] to [TEST CLOSED ] Note that in my adaption I left off the bit of the expression that causes it to fail to match if the square brackets are not properly balanced, in order to reduce my example to the barest minimum that would satisfy your request... the expression is already pretty unpleasantly complex. 请注意,在我的适应过程中,我省略了如果方括号未正确平衡时导致表达式不匹配的表达式的位,以便将我的示例减小到可以满足您要求的最低标准...表达式为已经非常不愉快了。

EDIT: Just saw your question got edited a bit while I was posting. 编辑:刚看到您的问题在我发布时被编辑了一下。 The parts of the regex I've supplied here that match "anything but [ and ]" should be able to be replaced with capture groups for the substrings you need to extract. 我在此处提供的正则表达式中与“除[和]以外的任何内容”匹配的部分应该可以用捕获组替换,以提取需要提取的子字符串。

If you have the luxury of fixing the regex with specific keywords or phrases, the following would work: 如果您愿意用特定的关键字或短语来固定正则表达式,则可以使用以下方法:

NOTES changed from (?:(?:\[)?([A-Z]+\[\]))<br />\] to \[([A-Z]+\s+[A-Z]+)

The above would match the string NOTES changed from [CLOSED[]<br />] to [TEST CLOSED and put CLOSED[] and TEST CLOSED into 2 separate groups. 上面的字符串与NOTES changed from [CLOSED[]<br />] to [TEST CLOSED匹配,并将CLOSED[]TEST CLOSED分为两个单独的组。

Update 更新资料

In fact you can make this even shorter (and a bit more non-specific) by using the . 实际上,您可以使用来使它更短(并且更加非特定) . specifier: 说明符:

NOTES changed from (?:(?:\[)?([A-Z]+\[\])).+\[([A-Z]+\s+[A-Z]+)

This means it will match like the above, only instead of being specific about matching the <br /> tags etc in between it will match regardless of what is in between. 这意味着它将像上面一样进行匹配,而不仅仅是明确匹配它们之间的<br />标记等,无论它们之间是什么,都将匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM