简体   繁体   中英

replace a string in entire dataframe from excel with value

I have this kind of data from excel

dminerals=pd.read_excel(datafile)
print(dminerals.head(5))

在此处输入图片说明

Then I replace the 'Tr' and NaN value using for loop with this script

for key, value in dminerals.iteritems(): 
    dminerals[key] = dminerals[key].replace(to_replace='Tr', value=int(1))
    dminerals[key] = dminerals[key].replace(to_replace=np.nan, value=int(0))

then print it again, it seems working and print the dataframe types.But it shows object data type.

print(dminerals.head(5))
print(dminerals['C'].dtypes)

在此处输入图片说明

I tried using this .astype to change one of the column ['C'] to integer but the result is value error

dminerals['C'].astype(int)
ValueError: invalid literal for int() with base 10: 'tr'

I thought I already change the 'Tr' in the dataframe into integer value. Is there anything that I miss in the process above? Please help, thank you in advance!

You are replacing Tr with 1, however there is a tr that's not being replaced (this is what you ValueError is saying. Remember python is case sensitive. Also, using for loops is extremely inefficient you might want to try using the following lines of code:

dminerales = dminerales.replace({'Tr':1,'tr':1}).fillna(0)

I'm using fillna() which is also better to fill the null values with the specified value 0 in this case, instead of using repalce.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM