[英]regex capture numbers after varied lengths of spaces
I try to use a non-capturing group to detect the spaces (before the numbers I needed) and not to bring spaces into my result, so I use 我尝试使用一个非捕获组来检测空格(在我需要的数字之前),而不是在结果中加入空格,因此我使用
(?: 1+)\\d*.?\\d* (?:1 +)\\ d *。?\\ d *
to process my text: 处理我的文字:
input: kMPCV/epS4SgFoNdLo3LOuClO/URXS/5 134.686356921 2018-06-14 21:50:35.494
input: pRVh7kPpFbtmuwS1NILiCzwHUVwJ4NcK 839.680408921 2018-06-14 22:13:39.996
input: Ga7MIXmXAsrbaEc1Yj60qYYblcRQpnpz 4859.688276920 2018-06-14 23:02:11.125
input: 4mqdb5njytfDOFpgeG3XS0Iv1OXFPEnb 1400.684675920 2018-06-14 23:33:42.031
and try to get the numbers. 并尝试获取数字。
But line 2 and 3 returns None result and line 1 and 4 returns numbers with 1 space before it: " 134.686356921" 但是第2行和第3行返回无结果,第1行和第4行返回前有1个空格的数字:“ 134.686356921”
Why I get different results? 为什么我得到不同的结果? Code is below: 代码如下:
import re
def calcprice(filename):
try:
print ('ok')
f = open(filename, 'r')
data = f.read()
rows = data.split('\n')
for row in rows:
print (re.search("[(?: 1+)\d*\.?\d*][1]",row))
except Exception as e:
print(e)
if __name__ == "__main__": ## If we are not importing this:
calcprice('dfk balance.txt')
Result: 结果:
<_sre.SRE_Match object; <_sre.SRE_Match对象; span=(52, 66), match=' 134.686356921'> span =(52,66),match ='134.686356921'>
None 没有
None 没有
<_sre.SRE_Match object; <_sre.SRE_Match对象; span=(51, 66), match=' 1400.684675920'> span =(51,66),match ='1400.684675920'>
Your current regex is basically one big character set : 您当前的正则表达式基本上是一个大字符集 :
[(?: 1+)\d*\.?\d*]
which doesn't make much sense, looks like a misunderstanding of how regex works. 没什么意义,似乎是对正则表达式工作原理的误解。 If you want to match the numbers, it would probably make more sense to lookbehind for a couple spaces, match digits and periods, and lookahead for another couple spaces: 如果要匹配数字,则往后看几个空格,匹配数字和句点,然后再看另外两个空格可能更有意义:
(?<= )[\d.]+(?= )
https://regex101.com/r/NRnXWb/1 https://regex101.com/r/NRnXWb/1
for row in rows:
print (re.search(r"(?<= )[\d.]+(?= )",row))
Your regex [(?: 1+)\\d*\\.?\\d*][1]
consists or 2 times a character class . 您的正则表达式[(?: 1+)\\d*\\.?\\d*][1]
包含或是字符类的 2倍。
If the number you want to match always contains a dot, you could use a word boundary and a positive lookahead to assert that what followes is a whitespace: 如果您要匹配的数字始终包含一个点,则可以使用单词边界和正向前瞻来断言其后是空白:
If it could also be without a dot you could check for a leading and a trailing whitespace using lookrounds and make the part which will match a dot and one or more times a digit optional (?:\\.\\d+)?
如果也可能没有点,则可以使用lookrounds检查前导空格和尾随空格,并使与点和一个或多个倍数匹配的部分为可选(?:\\.\\d+)?
. 。
Try the regex \\b(\\d+[\\d\\.]*)\\b
尝试正则表达式\\b(\\d+[\\d\\.]*)\\b
Your regex doesn't align to what you're trying to do.. It's pretty erroneous. 您的正则表达式与您要执行的操作不一致。这是非常错误的。
Try this pattern: +(\\d+(\\.\\d+)?) +
. 尝试以下模式: +(\\d+(\\.\\d+)?) +
。
Explanation: pattern will match number preceeded and followed by one or more spaces ( +
). 说明:pattern将匹配前面的数字,后跟一个或多个空格( +
)。 It will match numbers with optional decimal part ( (\\.\\d+)?
), which will become second capturing group in a match (but you won't need it anyway). 它将匹配具有可选小数部分( (\\.\\d+)?
)的数字,这将成为匹配项中的第二个捕获组(但无论如何您都不需要它)。
In every match, first capturing group \\1
will be your number. 在每场比赛中,第一个捕获组\\1
将是您的号码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.