简体   繁体   中英

Removing part of string from Pandas DataFrame column

I have loaded a set of data to a Pandas DataFrame such as below.

test['Consultation']
Out[13]: 
0     CONSULTATION      15.00
1     CONSULTATION      10.00
2     CONSULTATION      18.00
3     CONSULTATION       0.00
4     CONSULTATION      18.00

The values are contained in the 'Consultation' column in my DataFrame .

Any idea how I can replace 'Consultation' with blank and convert the column data type to int64 or float?

My expected output is as below.

test['Consultation']
Out[13]: 
0     15.00
1     10.00
2     18.00
3      0.00
4     18.00

This is in order for myself to use DataFrame.pivot_table('Consultation',rows='Provider') to calculate the mean for my row field.

Why would you read the data in this way in the first place, can't you just read it into two columns? But anyway, this can be done, watch this:

In [35]:

df=pd.DataFrame({'Consultation':['CONSULTATION      15.00',
'CONSULTATION      10.00',
'CONSULTATION      18.00',
'CONSULTATION       0.00',
'CONSULTATION      18.00']})
In [36]:

import re
In [37]:

p=re.compile('[0-9.]+')
In [38]:

df['Cons']=df['Consultation'].apply(lambda x: float(p.findall(x)[0]))
In [39]:

print df
              Consultation  Cons
0  CONSULTATION      15.00    15
1  CONSULTATION      10.00    10
2  CONSULTATION      18.00    18
3  CONSULTATION       0.00     0
4  CONSULTATION      18.00    18

[5 rows x 2 columns]
In [40]:

df.dtypes
Out[40]:
Consultation     object
Cons            float64
dtype: object

In your case, you can overwrite the original df['Consultation'] by df['Consultation']=df['Consultation'].apply(lambda x: float(p.findall(x)[0]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM