简体   繁体   English

使用正则表达式捕获与不包括空格的字符混合的数字作为数字

[英]Capture a Digits mixed with Characters excluding White-space as a Number using Regex

I have a text that can contain numbers, letters and special characters and I want to extract all the numbers in it using a regular expression.我有一个可以包含数字、字母和特殊字符的文本,我想使用正则表达式提取其中的所有数字。

The tricky part is that any two numbers with characters in between should be extracted as a whole number.棘手的部分是任何两个中间有字符的数字都应该作为整数提取。 Any two numbers with spaces in between should yield as two separate numbers.中间有空格的任何两个数字都应作为两个单独的数字产生。

Example: ds[44]%6c should yield 446 but 2021 ds[44]%6c should yield 2021 , 446示例: ds[44]%6c应该产生4462021 ds[44]%6c应该产生2021 , 446

I have tried the following Regex我尝试了以下正则表达式

(-?\d+)

Which works fine to some extent, but I don't know how to match until I see a whitespace and ignore the chars between the numbers.这在某种程度上工作得很好,但我不知道如何匹配,直到我看到一个空格并忽略数字之间的字符。

Matcher.results()

We can create a regular expression that captures a sequence containing at least one digit enclosed with zero or more non-white-space characters both on the left and on the right.我们可以创建一个正则表达式来捕获一个序列,该序列至少包含一个数字,左右两边都用零个或多个非空白字符包围。

And with this regular expression using Java 9 Matcher.results() we can generate a Stream of MatchResult s, which is an object containing information about the matching group.通过这个使用 Java 9 Matcher.results()的正则表达式,我们可以生成一个StreamMatchResult s,这是一个包含匹配组信息的 object。

The only thing left is to extract the matching groups, eliminate non-digit characters, and collect the result.剩下的就是提取匹配组,剔除非数字字符,收集结果。

public static final Pattern TEXT_WITH_DIGITS = Pattern.compile("[^\\s]*\\d+[^\\s]*");

public static List<Integer> getInts(String str) {
    
    return TEXT_WITH_DIGITS.matcher(str).results() // Stream<MatchResult>
        .map(MatchResult::group)                   // Stream<String> - extract the matching string
        .map(s -> s.replaceAll("\\D+", ""))        // remove non-digit characters
        .map(Integer::valueOf)                     // Stream<Integer> - parse the string
        .toList();
}

Pattern.splitAsStream()

Another option is to split the given string on white spaces.另一种选择是在空格上拆分给定的字符串。 For that, we can make use of the Java 8 Pattern.splitAsStream() which generates a stream of elements identical to the ones produced by String.split() .为此,我们可以使用 Java 8 Pattern.splitAsStream()生成一个 stream 与String.split()生成的元素相同的元素。 The difference is that Pattern.splitAsStream() creates a stream directly from the regex engine without allocating an intermediate array in memory.不同之处在于, Pattern.splitAsStream()直接从正则表达式引擎创建了一个 stream,而没有在 memory 中分配一个中间数组。

Then to apply the same transformation as in the previous example with one small addition: we need to address the edge-case when the givens string starts with a white-space.然后应用与前一个示例中相同的转换,并添加一点:我们需要解决给定字符串以空格开头时的边缘情况。 In such case, the very first element would be an empty string, and we can use dropWhile() to discard such string.在这种情况下,第一个元素将是一个空字符串,我们可以使用dropWhile()来丢弃这个字符串。

public static final Pattern WHITE_SPACES = Pattern.compile("\\s+");

public static List<Integer> getInts(String str) {
    
    return WHITE_SPACES.splitAsStream(str)
        .dropWhile(String::isEmpty)         // very first element might be empty, and if that's the case it needs to be skipped
        .map(s -> s.replaceAll("\\D+", "")) // remove non-digit characters
        .map(Integer::valueOf)              // Stream<Integer> - parse the string
        .toList();
}

main()

public static void main(String[] args) {
    System.out.println(getInts("ds[44]%6c"));
    System.out.println(getInts("2021 ds[44]%6c"));
}

Output: Output:

[446]       // "ds[44]%6c"
[2021, 446] // "2021 ds[44]%6c"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM