简体   繁体   English

正则表达式-提取具有特定模式的子字符串

[英]Regex - extract substring with specific pattern

I have a large string as shown below: 我有一个很大的字符串,如下所示:

99/34 12/34 This text is 22.67 22/23 33/34 Second text is like is 22.67 55/66 45/54 Third text is like is 32.27 99/34 12/34此文本为22.67 22/23 33/34第二文本为22.67 55/66 45/54第三文本为32.27

and so on. 等等。 I am trying to form a regex expression to extract all the substrings that start with "two digits, slash, two digits, one whitespace, two digits, slash, two digits, any character any number of repetitions,one . literal and two digits" from the large string. 我试图形成一个正则表达式来提取所有以“两位数,斜杠,两位数,一个空格,两位数,斜杠,两位数,任意字符,任意数量的重复,一位。文字和两位数”开头的所有子字符串。从大串。

The regex I tried is \\d{2}/\\d{2}\\s{1}.*\\.\\d{2} . 我尝试过的正则表达式是\\d{2}/\\d{2}\\s{1}.*\\.\\d{2} But, this returns the a single string "99/34 12/34 This text is 22.67 22/23 33/34 Second text is like is 22.67 55/66 45/54 Third text is like is 32.27". 但是,此返回单个字符串“ 99/34 12/34该文本为22.67 22/23 33/34第二个文本为22.67 55/66 45/54第三个文本为32.27”。 I would like to get this extracted as 我想将其提取为

99/34 12/34 This text is 22.67 99/34 12/34这段文字是22.67

22/23 33/34 Second text is like is 22.67 22/23 33/34第二个文本就像是22.67

55/66 45/54 Third text is like is 32.27 55/66 45/54第三段文字是32.27

How would I do this? 我该怎么做? I am using C# (.NET 4.5) 我正在使用C#(.NET 4.5)

The problem lies in the greedy .* it will try to match as many characters as possible while still giving a match. 问题在于贪婪.*它会在匹配时尝试匹配尽可能多的字符。

You can simply modify your regex thus 您可以简单地修改您的正则表达式

 \d{2}/\d{2}\s.*?\d{2}\.\d{2}

The ? ? after the * makes it not greedy and only consume (eat) as few characters as possible in order to find a match. *表示不贪婪,并且仅消耗(吃掉)尽可能少的字符以找到匹配项。

Note that I also changed \\s{1} to \\s as it was a single character to start with an qualifying it as exactly one does nothing but obfuscate the pattern. 请注意,我也将\\s{1}更改为\\s因为从限定字符开始它是一个字符,因为除了模糊模式之外,它什么也没有做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM