简体   繁体   中英

Pandas Data Frame Partial String Replace

Given this data frame:

import pandas as pd
d=pd.DataFrame({'A':['a','b',99],'B':[1,2,'99'],'C':['abcd99',4,5]})
d

    A   B   C
0   a   1   abcd*
1   b   2   4
2   99  99  5

I want to replace all 99s in the entire data frame with asterisks. I've tried this:

d.replace('99','*')

...but it only worked in the case of the string 99 in column B.

Thanks in advance!

If you want to replace all the 99 s , try using regex

>>> d.astype(str).replace('99','*',regex=True)

    A   B   C
0   a   1   abcd*
1   b   2   4
2   *   *   5

This will do the job:

import pandas as pd
d=pd.DataFrame({'A':['a','b',99],'B':[1,2,'99'],'C':['abcd99',4,5]})
d=d.astype(str)
d.replace('99','*',regex=True)

which gives

    A   B   C
0   a   1   abcd*
1   b   2   4
2   *   *   5

Note that this creates a new dataframe. You can also do that inplace instead:

d.replace('99','*',regex=True,inplace=True)

Problem is values 99 in column A and B are of different types:

>>> type(d.loc[2,"A"])
<class 'int'>
>>> type(d.loc[2,"B"])
<class 'str'>

You can cast your dataframe to string type via df.astype() and then replace, resulting in:

>>> d.astype(str).replace("99","*")
   A  B       C
0  a  1  abcd99
1  b  2       4
2  *  *       5

Edit: using regex is the correct solution as given by other answers. I for some reason missed the abcd* in your DataFrame.

Will leave this here, just in case it is helpful to someone else.

Use numpy s character functions

d.values[:] = np.core.defchararray.replace(d.values.astype(str), '99', '*')
d

   A  B      C
0  a  1  abcd*
1  b  2      4
2  *  *      5

naive time test

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM