[英]regex for capturing digits and digit ranges
i have the following string 我有以下字符串
Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)
i want to capture 我想捕捉
212,323.222
2-2.24
0.5
ie i want the above three results from the string, 即我想要字符串的上述三个结果,
can any one help me with this regex 有人可以帮我这个正则表达式吗
I noticed that your hyphen in 2–2.4kg is not really hyphen, its a unicode 0x2013 "DASH". 我注意到您在2–2.4kg中的连字符不是真正的连字符,它是Unicode 0x2013“ DASH”。
So, here is another regex in C# 因此,这是C#中的另一个正则表达式
@"[0-9]+([,.\u2013-][0-9]+)*"
Test 测试
MatchCollection matches = Regex.Matches("Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)", @"[0-9]+([,.\u2013-][0-9]+)*");
foreach (Match m in matches) {
Console.WriteLine(m.Groups[0]);
}
Here is the results, my console does not support printing unicode char 2013, so its "?" 这是结果,我的控制台不支持打印uni char char 2013,因此其为“?”。 but its properly matched.
但其正确匹配。
2121,323.222
2?2.4
0.5
Okay I didn't notice the C# tag until now. 好的,直到现在我才注意到C#标签。 I will leave the answer but I know that's not what you expected, see if you can do something with it.
我将留下答案,但我知道这不是您所期望的,请看您是否可以做些什么。 Perhaps the title should have mentioned the programming language?
也许标题应该提到编程语言?
Sure: 当然:
Fat mass loss was (.*) greater for GPLC \((.*) vs. (.*)kg\)
Find your substrings in \\1, \\2 and \\3. 在\\ 1,\\ 2和\\ 3中找到子字符串。 If for Emacs, swap all parentheses and escaped parentheses.
如果是Emacs,请交换所有括号和转义括号。
How about something like this: 这样的事情怎么样:
^.*((?:\d+,)*\d+(?:\.\d+)?).*(\d+(?:\.\d+)?(?:-\d+(?:\.\d+))?).*(\d+(?:\.\d+)).*$
A little more general, I think. 我想更一般一些。 I'm a little concerned about .* being greedy.
我有点担心。*贪婪。
Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)
GPLC的脂肪质量损失增加了2121,323.222(2-2.4kg比0.5kg)
a generalized extractor: 广义提取器:
/\D+?([\d\,\.\-]+)/g
explanation: 说明:
/ # start pattern
\D+ # 1 or more non-digits
( # capture group 1
[\d,.-]+ # character class, 1 or more of digits, comma, period, hyphen
) # end capture group 1
/g # trailing regex g modifier (make regex continue after last match)
sorry I don't know c# well enough for a full writeup, but the pattern should plug right in. 抱歉,我对C#不够了解,无法完整撰写文章,但是该模式应该可以正确插入。
see: http://www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx for some implementation examples. 有关某些实现示例,请参见: http : //www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx 。
It looks like you're trying to find all numbers in the string (possibly with commas inside the number), and all ranges of numbers such as "2-2.4". 似乎您正在尝试查找字符串中的所有数字(数字中可能带有逗号)以及所有数字范围,例如“ 2-2.4”。 Here is a regex that should work:
这是一个应该起作用的正则表达式:
\d+(?:[,.-]\d+)*
From C# 3, you can use it like this: 在C#3中,您可以像这样使用它:
var input = "Fat mass loss was 2121,323.222 greater for GPLC (2-2.4kg vs. 0.5kg)";
var pattern = @"\d+(?:[,.-]\d+)*";
var matches = Regex.Matches(input, pattern);
foreach ( var match in matches )
Console.WriteLine(match.Value);
I came out with something like this atrocity: 我冒出了这样的暴行:
-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?(?:[–-]-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?)?
Out of witch -?\\d(?:,?\\d)*(?:\\.(?:\\d(?:,?\\d)*\\d|\\d))?
在女巫外面
-?\\d(?:,?\\d)*(?:\\.(?:\\d(?:,?\\d)*\\d|\\d))?
is repeated twice, with –
in the middle (note that this is a long hyphen). 重复两次,中间带有
–
(请注意,这是一个长连字符)。
This should take care of dots and commas outside of numbers, eg: hello,23,45.2-7world
- will capture 23,45.2-7
. 这应该注意数字之外的点和逗号,例如:
hello,23,45.2-7world
将捕获23,45.2-7
。
Hmm, this is a tricky question, especially because the input string contains unicode character – ( EN DASH ) instead of - ( HYPHEN-MINUS ). 嗯,这是一个棘手的问题,尤其是因为输入字符串包含Unicode字符-( EN DASH )而不是-( HYPHEN-MINUS )。 Therefore the correct regex to match the numbers in the original string would be:
因此,与原始字符串中的数字匹配的正确正则表达式为:
\d+(?:[\u2013,.]\d+)*
If you want a more generic approach would be: 如果您想使用更通用的方法,可以:
\d+(?:[\p{Pd}\p{Pc}\p{Po}]\d+)*
which matches dash punctuation , connecter punctuation and other punctuation . 与破折号 , 连接符和其他标点匹配。 See here for more information about those.
有关更多信息,请参见此处 。
An implementation in C# would look like this: C#中的实现如下所示:
string input = "Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)";
try {
Regex rx = new Regex(@"\d+(?:[\p{Pd}\p{Pc}\p{Po}\p{C}]\d+)*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
Match match = rx.Match(input);
while (match.Success) {
// matched text: match.Value
// match start: match.Index
// match length: match.Length
match = match.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
I got the solution to my problem. 我已经解决了我的问题。
The following is the Regex that gave my desired result: 以下是产生我想要的结果的正则表达式:
(([0-9]+)([–.,-]*))+
Let's try this one : 让我们试试这个:
(?=\d)([0-9,.-]+)(?<=\d)
It captures all expressions containing only : 它捕获仅包含的所有表达式:
It works with a single digit expression and does not include beginning or trailing [.,-]. 它与单个数字表达式一起使用,并且不包括开头或结尾的[。,-]。
Hope this helps. 希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.