简体   繁体   中英

Python Replace Whole Values in Dataframe String and Not Substrings

I am trying to replace strings in a dataframe if the whole string equals another string. I do not want to replace substrings.

So:

If I have df:

 Index  Name       Age
   0     Joe        8
   1     Mary       10
   2     Marybeth   11

and I want to replace "Mary" when the whole string matches "Mary" with "Amy" so I get

 Index  Name       Age
   0     Joe        8
   1     Amy        10
   2     Marybeth   11

I'm doing the following:

df['Name'] = df['Name'].apply(lambda x: x.replace('Mary','Amy'))

My understanding from searching around is that the defaults of replace set regex=False and replace should look for the whole value in the dataframe to be "Mary". Instead I'm getting this result:

 Index  Name       Age
   0     Joe        8
   1     Amy        10
   2     Amybeth   11

What am I doing wrong?

replace + dict is the way to go (With DataFrame, you are using Series.str.replace )

df['Name'].replace({'Mary':'Amy'})
Out[582]: 
0         Joe
1         Amy
2    Marybeth
Name: Name, dtype: object
df['Name'].replace({'Mary':'Amy'},regex=True)
Out[583]: 
0        Joe
1        Amy
2    Amybeth
Name: Name, dtype: object

Notice they are different

Series : https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.replace.html

DataFrame : https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html

您还可以使用loc来查找名称完全匹配的实例,然后设置为新名称。

df.loc[df['Name'] == 'Mary', 'Name'] = "Amy"

Explanation:

When you apply it like this - you are working with strings, not with Pandas Series:

In [42]: df['Name'].apply(lambda x: print(type(x)))
<class 'str'>  # <---- NOTE
<class 'str'>  # <---- NOTE
<class 'str'>  # <---- NOTE
Out[42]:
0    None
1    None
2    None
Name: Name, dtype: object

It's the same as:

In [44]: 'Marybeth'.replace('Mary','Amy')
Out[44]: 'Amybeth'

Solution:

Use Series.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None) properly (without Series.apply() ) - per default ( regex=False ) it will replace whole strings - as you expect it to work:

In [39]: df.Name.replace('Mary','Amy')
Out[39]:
0         Joe
1         Amy
2    Marybeth
Name: Name, dtype: object

you can explicitly specify regex=True , this will replace substrings:

In [40]: df.Name.replace('Mary','Amy', regex=True)
Out[40]:
0        Joe
1        Amy
2    Amybeth
Name: Name, dtype: object

NOTE: Series.str.replace(pat, repl, n=-1, case=None, flags=0) doesn't have regex parameter - it's always treats pat and repl as RegEx's:

In [41]: df.Name.str.replace('Mary','Amy')
Out[41]:
0        Joe
1        Amy
2    Amybeth
Name: Name, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM