简体   繁体   English

Java:在字符串中,如何提取正则表达式匹配之前的数字?

[英]Java: In a string, how to extract a number that precedes a regex match?

I have a string that contains a fraction of 100. The string looks like this: 我有一个包含100分的字符串。字符串如下所示:

blah
SCORE
95 / 100
USAGE
1.4GB / Unlimited
blah

I want to get the value that is before " / 100". 我想得到“/ 100”之前的值。 In the above case, this value would be 95. But it can be any number between 0 and 100. 在上面的例子中,这个值是95.但它可以是0到100之间的任何数字。

I know that if I use the following regex expression matches the " / 100" portion of the string. 我知道如果我使用以下正则表达式匹配字符串的“/ 100”部分。 What I need to know is how to get the number that precedes this match. 我需要知道的是如何获得此匹配之前的数字。

\s\/\s100

You can use a positive lookahead in order to make digits preceded by that pattern. 您可以使用正向前瞻,以使数字前面的数字。

In the expression below, "95" would be in the capturing group. 在下面的表达式中,“95”将在捕获组中。

\b(\d+)\b(?= \/ 100\b)
  • \\b - Word boundary to ensure that the digit isn't surrounded by any other characters. \\b - 字边界,以确保数字不被任何其他字符包围。

  • (\\d+) - Capturing group to match one or more digits. (\\d+) - 捕获组以匹配一个或多个数字。

  • (?= \\/ 100\\b) - Positive lookahead to match the preceeding digits if they are followed by / 100 . (?= \\/ 100\\b) - 正前瞻以匹配前面的数字(如果它们后跟/ 100


But it can be any number between 0 and 100. 但它可以是0到100之间的任何数字。

If the number has to be between 0 and 100, use could also use the following: 如果数字必须介于0和100之间,则还可以使用以下内容:

\b(\d{1,2}|100)\b(?= \/ 100\b)
  • \\d{1,2}|100 matches a number between 0-100 \\d{1,2}|100匹配0-100之间的数字
  • The rest is the same as the example above. 其余部分与上面的例子相同。

Assuming there is only one XX / 100 (where XX can be 100 or a 1- or 2-digit integer) in your string and it appears at the beginning of some line, you can safely use 假设你的字符串中只有一个XX / 100 (其中XX可以是100或1或2位整数)并且它出现在某行的开头,你可以安全地使用

"(?m)^(100|\\d{1,2})\\s*/\\s*100\\b"

See Java demo : 请参阅Java演示

String str = "blah\nSCORE\n95 / 100\nUSAGE\n1.4GB / Unlimited\nblah";
Pattern ptrn = Pattern.compile("(?m)^(100|\\d{2})\\s*/\\s*100\\b");
Matcher matcher = ptrn.matcher(str);
if (matcher.find()) {
    System.out.println(matcher.group(1));
} // => 95

The regex matches : 正则表达式匹配

  • (?m)^ - start of a line (since (?m) enables a multiline mode when ^ matches a line start rather than a string start) that is followed by... (?m)^ - 一行的开始(因为(?m)启用多行模式,当^匹配行开头而不是字符串开头)后面跟着......
  • (100|\\\\d{1,2}) - a capturing group #1 that matches 100 or any 1 or 2 digits (100|\\\\d{1,2}) - 与100或任何1或2位数匹配的捕获组#1
  • \\\\s*/\\\\s* - zero or more whitespaces followed by / followed by zero or more whitespaces \\\\s*/\\\\s* - 零个或多个空格后跟/后跟零个或多个空格
  • 100\\\\b - a whole word 100 (that is, there can be no more digits, or even letters and an underscore _ right after). 100\\\\b -整字100 (即,不可能有更多的数字,甚至字母和下划线_右后)。

Note that the value is captured (due to a paire of unescaped parentheses (...) ) into Group 1 that we get with matcher.group(1) after executing the matcher. 需要注意的是价值被捕获 (由于转义括号的PAIRE (...)为1组,我们用得matcher.group(1)执行匹配之后。 Capturing is more appropriate for this task since you need no overlapping matches where lookarounds are necessary, and they are less efficient than capturing. 捕获更适合此任务,因为您不需要重叠匹配,其中需要使用外观,并且它们的效率低于捕获。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM