简体   繁体   English

来自字符串的正则表达式编号

[英]Regex numbers from string

I am trying to write a regex that can find only numbers from given string. 我正在尝试编写一个只能从给定字符串中找到数字的正则表达式。 What I mean is: 我的意思是:

Input: My number is +12 345 678. I have galaxy s3, its symbol 34abc. 输入: My number is +12 345 678. I have galaxy s3, its symbol 34abc.

Output: 345 and 678 (but not +12 , 3 from word s3 or 34 from 34abc ) 输出: 345678 (而不是+123从字s33434abc

I tried just numbers ( \\d+ ) and I combinations with white and words characters. 我尝试了数字( \\d+ )和我与白色和单词字符的组合。 The closest was ^\\d$ but that doesn't work as my numbers are part of the bigger string, not whole string themselves. 最接近的是^\\d$但这不起作用,因为我的数字是较大字符串的一部分,而不是整个字符串本身。 Can you give me a hint? 你能给我一个提示吗?

------- EDIT -------编辑

Looks like I just don't know how to check a character without actually getting it into result. 看起来我只是不知道如何检查一个角色而不实际得到它。 Like "digit that follow space character (without this space)". 就像“跟随空间角色的数字(没有这个空间)”。

In general case, you can make use of lookbehind and lookahead : 一般情况下,您可以使用lookbehind和lookahead

(?<=^|\s)\d+(?=$|\s)

The part which makes it into the captured output is \\d+ . 使其成为捕获输出的部分是\\d+ Lookbehind and lookahead are not included in the match. Lookbehind和lookahead不包含在比赛中。

I just included spaces as delimiters in the regex, but you may replace \\s with any character class, as defined by your requirements. 我只是在正则表达式中包含空格作为分隔符,但您可以将\\s替换为您的要求所定义的任何字符类。 For example, to allow dots as separators (both in front and after the digits), use the following regex: 例如,要允许点作为分隔符(在数字前面和后面),请使用以下正则表达式:

(?<=^|[\s.])\d+(?=$|[\s.])

The (?<=^|\\s) should be read as follows: (?<=^|\\s)应该如下所示:

  • (?<= ... ) defines the lookbehind group. (?<= ... )定义了lookbehind组。
  • The expression which must precede the \\d+ is ^|\\s , meaning "either start of the line ( ^ ) or whitespace". 必须在\\d+之前的表达式是^|\\s ,意思是“行的开头( ^ )或空格”。

Similarly, (?=$|\\s) defines the lookahead group (it must follow the captured digits), which is either end of the line ( $ ) or whitespace. 类似地, (?=$|\\s)定义了前瞻组(它必须遵循捕获的数字),它是行的末尾( $ )或空格。


A note on \\b mentioned in other answers: it is a nice feature, means "word boundary", but the "word characters" are not customizable. 其他答案中提到的关于\\b的注释:它是一个很好的功能,意思是“单词边界”,但“单词字符”不可自定义。 This means that, for example, the "+" character is considered to be a separator and you can't change this if you use \\b . 这意味着,例如,“+”字符被视为分隔符,如果使用\\b则无法更改此字符。 With lookaround, you can customize the separators to your needs. 通过环视,您可以根据需要自定义分隔符。

Safer RegEx 更安全的RegEx

Try this: 试试这个:

(?<=\s|^)\d+(?=\s|\b)

更安全的RegEx演示

Live Demo on Regex101 Regex101现场演示

How it works: 这个怎么运作:

(?<=\s|^)          # Start of String OR Whitespace (will not select +)
                   # Positive Lookbehind ensures the data is not included in the match
\d+                # Digit(s)
(?=\s|\b)          # Whitespace OR Word Boundary
                   # Positive Lookahead ensures the data is not included in the match

Lookarounds do not take up any characters in the match, so they can be used so Capture Groups do not need to be. Lookarounds不会占用匹配中的任何字符,因此可以使用它们,因此Capture Groups不需要。 For example: 例如:

# Regex /.*barbaz/
barbaz          # Matched Data Result: barbaz
foobarbaz       # Matched Data Result: foobarbaz

# Regex (with Positive Lookahead) /.*bar(?=baz)/
barbaz          # Matched Data Result: bar
foobarbaz       # Matched Data Result: foobar

As you can see with the second RegEx, baz is never included in the matched data result, however it was required in the string for the RegEx to match. 正如您在第二个RegEx中看到的那样, baz从不包含在匹配的数据结果中,但是在字符串中需要RegEx才能匹配。 The RegEx above works on the same principle 上述RegEx的工作原理相同


Not as Safe (Old) RegEx 不那么安全(旧)RegEx

You can try this RegEx: 你可以尝试这个RegEx:

\b\d+\b

\\b is a Word Boundary. \\b是一个单词边界。 This will, however, select 12 from +12 . 但是,这将从+12选择12

You can change the RegEx to this to stop 12 from being selected: 您可以将RegEx更改为此选项以阻止12被选中:

(?<!\+)\b\d+\b

This uses a Negative Lookbehind and will fail if there is a + before the digits. 这使用负面后观 ,如果数字前面有+ ,则会失败。

Live Demo on Regex101 Regex101现场演示

演示

What you seem to want is a sequence of digits ( \\d+ ) that is preceded by a whitespace ( \\s ) or the start of the string ( ^ ), and followed by a whitespace or punctuation character ( [\\s.,:;!?] ) or the end of the string ( $ ), but the preceding/following whitespace or punctuation character should not be included in the match, so you need positive lookahead ( (?=xxx) ) and lookbehind ( (?<=xxx) ). 你似乎想要的是一个数字序列( \\d+ ),前面是空格( \\s )或字符串的开头( ^ ),后跟一个空格或标点字符( [\\s.,:;!?] )或字符串( $ )的结尾,但前面/下面的空格或标点字符不应该包含在匹配中,所以你需要正向前瞻( (?=xxx) )和lookbehind( (?<=xxx) )。

(?<=^|\s)\d+(?=[\s.,:;!?]|$)

See regex101 for demo . 有关演示,请参阅regex101

Remember to double the backslashes in a Java literal. 请记住将Java文字中的反斜杠加倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM