简体   繁体   中英

How do I extract data from a DataFrame using regular expressions?

I am trying to correct data in a DataFrame and am facing a value replacement problem. The original value comes in the format "31 ^" or "54_", I need it to come in the format Integer for example 31.54

frame = pd.DataFrame({'first': [123, '32^'], 'second': [23,'13_']})
frame['first'] = frame['first'].str.extract(r'([0-9]+)', expand=False)


first   second
0   NaN 23
1   32  13_

Use Series.str.extract with fillna :

In [679]: frame['first'] = frame['first'].str.extract('(\d+)').fillna(frame['first'])

In [680]: frame['second'] = frame['second'].str.extract('(\d+)').fillna(frame['second'])

In [681]: frame
Out[681]: 
  first second
0   123     23
1    32     13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM