I'm trying to extract a floating value from a string for a particular column.
Original Output
DATE strCondition
4/3/2018 2.9
4/3/2018 3.1, text
4/3/2018 2.6 text
4/3/2018 text, 2.7
and other variations. I've also tried regex but my knowledge here is limited, I've come up with:
clean = df['strCondition'].str.contains('\d+km')
df['strCondition'] = df['strCondition'].str.extract('(\d+)', expand = False).astype(float)
where the output ends up looking like this where it displays the main integer shown...
DATE strCondition
4/3/2018 2.0
4/3/2018 3.0
4/3/2018 2.0
4/3/2018 2.0
My desired output would be along the lines of:
DATE strCondition
4/3/2018 2.9
4/3/2018 3.1
4/3/2018 2.6
4/3/2018 2.7
I appreciate your time and inputs!
EDIT: I forgot to mention that in my original dataframe there are strCondition entries similar to
2.9(1.0) #where I would like both numbers to get returned
11/11/2018 #where this date as a string object can be discarded
Sorry for the inconvenience!
Try:
df['float'] = df['strCondition'].str.extract(r'(\d+.\d+)').astype('float')
Output:
DATE strCondition float
0 4/3/2018 2.9 2.9
1 4/3/2018 3.1, text 3.1
2 4/3/2018 2.6 text 2.6
3 4/3/2018 text, 2.7 2.7
A simple replace would be
Find (?m)^([\d/]+[ \t]+).*?(\d+\.\d+).*
Replace \1\2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.