简体   繁体   中英

How to fix this regular expression in python?

I want to process some string date which print out like this

'node0, node1 0.04, node8 11.11, node14 72.21\n'
'node1, node46 1247.25, node6 20.59, node13 64.94\n'

I want to find all the floating points here , this is the code I use

for node in nodes
    pattern= re.compile('(?<!node)\d+.\d+')
    distance = pattern.findall(node)

however the result is like this

['0.04', '11.11', '4 72']

while what i want is this

['0.04', '11.11', '72.21']

Any suggestion on fixing this regular expression?

The . in your expression is unescaped.

for node in nodes:
    pattern = re.compile(r"(?<!node)\d+\.\d+")
    distance = pattern.findall(node)

In regular expressions, a . character is interpreted as a wildcard character and can match (almost) any character. Thus your search pattern actually allows a digit or set of digits, followed by any character, followed by another digit or set of digits. To stop this interpretation of the dot character, escape it with a backslash \\ .

(An aside: You don't need to compile your regex pattern inside your loop. In fact, that will slow your code down.)

pattern = re.compile('(?<!node)\d+\.\d+')
for node in nodes:
    distance = pattern.findall(node)
    print distance

output:

['0.04', '11.11', '72.21']
['1247.25', '20.59', '64.94']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM