简体   繁体   中英

How to extract a float from a string after a keyword in python

I have the following string from which I need to extract the value 14.123456 which is directly after the keyword airline_freq: (which is a unique keyword in my string)

Please help find the correct regex (indexing m.group() doesn't work beyond 0)

import re
s =  "DATA:init:     221.000OTHER:airline_freq:  14.123456FEATURE:airline_amp:   0.333887 more text"
m = re.search(r'[airline_freq:\s]?\d*\.\d+|\d+', s)
m.group()

$ result 221.000

You can probably use this:

(?<=airline_freq:)\s*(?:-?(?:\d+(?:\.\d*)?|\.\d+))

This uses a lookbehind to enforce that the number is preceded by airline_freq: but it does not make it part of the match.

The number-matching part of the regex can match numbers with or without . and, if there is . , it can also be just leading or trailing (in this case clearly not before the - sign). You can also allow an optional + instead of the - , by using [+-] instead of - .

Unfortunately it seems Python does not allow variable length lookbehind, so I cannot put the \s* in it; the consequence is that the spaces between the : and the number are part of the match. This in general could be no problem, as leading spaces when giving a number to a program are generally skipped automatically.

However, you can still remove the first ?: in the regex above to make the number-matching group capturing, so that the number is available as \1 .

The example is here .

This will match only the float as a single group.

r'airline_freq:\s+([-0-9.]+)'

"DATA:init:     221.000OTHER:airline_freq:  14.123456FEATURE:airline_amp:   0.333887 more text"

I have this:

(?<=airline_freq\:\s\s)(\d+\.\d+)

In [2]: import re
   ...: s =  "DATA:init:     221.000OTHER:airline_freq:  14.123456FEATURE:airline_amp:   0.333887 more text"
   ...: m = re.search(r'(?<=airline_freq\:\s\s)(\d+\.\d+)', s)
   ...: m.group()
Out[2]: '14.123456'

Test: https://regexr.com/51q41

If you're not sure about the number of spaces between airline_freq: and the desired float number, you can use:

(?<=airline_freq\:)\s*(\d+\.\d+)

and m.group().lstrip() to get rid of the left spaces.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM