简体   繁体   中英

Pandas dataframe: Extracting float values from string in a column

I'm trying to extract a floating value from a string for a particular column.

Original Output

DATE        strCondition
4/3/2018    2.9
4/3/2018    3.1, text
4/3/2018    2.6 text
4/3/2018    text, 2.7 

and other variations. I've also tried regex but my knowledge here is limited, I've come up with:

clean = df['strCondition'].str.contains('\d+km')
df['strCondition'] = df['strCondition'].str.extract('(\d+)', expand = False).astype(float)

where the output ends up looking like this where it displays the main integer shown...

DATE        strCondition
4/3/2018    2.0
4/3/2018    3.0
4/3/2018    2.0
4/3/2018    2.0 

My desired output would be along the lines of:

DATE        strCondition
4/3/2018    2.9
4/3/2018    3.1
4/3/2018    2.6
4/3/2018    2.7 

I appreciate your time and inputs!

EDIT: I forgot to mention that in my original dataframe there are strCondition entries similar to

2.9(1.0) #where I would like both numbers to get returned
11/11/2018 #where this date as a string object can be discarded 

Sorry for the inconvenience!

Try:

df['float'] = df['strCondition'].str.extract(r'(\d+.\d+)').astype('float')

Output:

       DATE strCondition  float
0  4/3/2018          2.9    2.9
1  4/3/2018    3.1, text    3.1
2  4/3/2018     2.6 text    2.6
3  4/3/2018    text, 2.7    2.7

A simple replace would be

Find (?m)^([\d/]+[ \t]+).*?(\d+\.\d+).*

Replace \1\2

https://regex101.com/r/pVC4jc/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM