简体   繁体   English

正则表达式在不同长度的空格后捕获数字

[英]regex capture numbers after varied lengths of spaces

I try to use a non-capturing group to detect the spaces (before the numbers I needed) and not to bring spaces into my result, so I use 我尝试使用一个非捕获组来检测空格(在我需要的数字之前),而不是在结果中加入空格,因此我使用

(?: 1+)\\d*.?\\d* (?:1 +)\\ d *。?\\ d *

to process my text: 处理我的文字:

 input: kMPCV/epS4SgFoNdLo3LOuClO/URXS/5         134.686356921  2018-06-14 21:50:35.494
 input: pRVh7kPpFbtmuwS1NILiCzwHUVwJ4NcK         839.680408921  2018-06-14 22:13:39.996
 input: Ga7MIXmXAsrbaEc1Yj60qYYblcRQpnpz         4859.688276920  2018-06-14 23:02:11.125
 input: 4mqdb5njytfDOFpgeG3XS0Iv1OXFPEnb        1400.684675920  2018-06-14 23:33:42.031

and try to get the numbers. 并尝试获取数字。

But line 2 and 3 returns None result and line 1 and 4 returns numbers with 1 space before it: " 134.686356921" 但是第2行和第3行返回无结果,第1行和第4行返回前有1个空格的数字:“ 134.686356921”

Why I get different results? 为什么我得到不同的结果? Code is below: 代码如下:

import re
def calcprice(filename):

    try:
        print ('ok')
        f = open(filename, 'r')
        data = f.read()
        rows = data.split('\n')

        for row in rows:
            print (re.search("[(?: 1+)\d*\.?\d*][1]",row))


    except Exception as e:
        print(e)


if __name__ == "__main__": ## If we are not importing this:
    calcprice('dfk balance.txt')

Result: 结果:

<_sre.SRE_Match object; <_sre.SRE_Match对象; span=(52, 66), match=' 134.686356921'> span =(52,66),match ='134.686356921'>

None 没有

None 没有

<_sre.SRE_Match object; <_sre.SRE_Match对象; span=(51, 66), match=' 1400.684675920'> span =(51,66),match ='1400.684675920'>

Your current regex is basically one big character set : 您当前的正则表达式基本上是一个大字符集

[(?: 1+)\d*\.?\d*]

which doesn't make much sense, looks like a misunderstanding of how regex works. 没什么意义,似乎是对正则表达式工作原理的误解。 If you want to match the numbers, it would probably make more sense to lookbehind for a couple spaces, match digits and periods, and lookahead for another couple spaces: 如果要匹配数字,则往后看几个空格,匹配数字和句点,然后再看另外两个空格可能更有意义:

(?<=  )[\d.]+(?=  )

https://regex101.com/r/NRnXWb/1 https://regex101.com/r/NRnXWb/1

for row in rows:
    print (re.search(r"(?<=  )[\d.]+(?=  )",row))

Your regex [(?: 1+)\\d*\\.?\\d*][1] consists or 2 times a character class . 您的正则表达式[(?: 1+)\\d*\\.?\\d*][1]包含或是字符类的 2倍。

If the number you want to match always contains a dot, you could use a word boundary and a positive lookahead to assert that what followes is a whitespace: 如果您要匹配的数字始终包含一个点,则可以使用单词边界和正向前瞻来断言其后是空白:

\\b\\d+\\.\\d+(?= )

If it could also be without a dot you could check for a leading and a trailing whitespace using lookrounds and make the part which will match a dot and one or more times a digit optional (?:\\.\\d+)? 如果也可能没有点,则可以使用lookrounds检查前导空格和尾随空格,并使与点和一个或多个倍数匹配的部分为可选(?:\\.\\d+)? .

(?<= )\\d+(?:\\.\\d+)?(?= )

Demo 演示

Try the regex \\b(\\d+[\\d\\.]*)\\b 尝试正则表达式\\b(\\d+[\\d\\.]*)\\b

Your regex doesn't align to what you're trying to do.. It's pretty erroneous. 您的正则表达式与您要执行的操作不一致。这是非常错误的。

Try this pattern: +(\\d+(\\.\\d+)?) + . 尝试以下模式: +(\\d+(\\.\\d+)?) +

Explanation: pattern will match number preceeded and followed by one or more spaces ( + ). 说明:pattern将匹配前面的数字,后跟一个或多个空格( + )。 It will match numbers with optional decimal part ( (\\.\\d+)? ), which will become second capturing group in a match (but you won't need it anyway). 它将匹配具有可选小数部分( (\\.\\d+)? )的数字,这将成为匹配项中的第二个捕获组(但无论如何您都不需要它)。

In every match, first capturing group \\1 will be your number. 在每场比赛中,第一个捕获组\\1将是您的号码。

Demo 演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM