I am trying to replace strings in a dataframe if the whole string equals another string. I do not want to replace substrings.
So:
If I have df:
Index Name Age
0 Joe 8
1 Mary 10
2 Marybeth 11
and I want to replace "Mary" when the whole string matches "Mary" with "Amy" so I get
Index Name Age
0 Joe 8
1 Amy 10
2 Marybeth 11
I'm doing the following:
df['Name'] = df['Name'].apply(lambda x: x.replace('Mary','Amy'))
My understanding from searching around is that the defaults of replace
set regex=False
and replace
should look for the whole value in the dataframe to be "Mary". Instead I'm getting this result:
Index Name Age
0 Joe 8
1 Amy 10
2 Amybeth 11
What am I doing wrong?
replace
+ dict
is the way to go (With DataFrame, you are using Series.str.replace
)
df['Name'].replace({'Mary':'Amy'})
Out[582]:
0 Joe
1 Amy
2 Marybeth
Name: Name, dtype: object
df['Name'].replace({'Mary':'Amy'},regex=True)
Out[583]:
0 Joe
1 Amy
2 Amybeth
Name: Name, dtype: object
Notice they are different
Series
: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.replace.html
DataFrame
: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html
您还可以使用loc
来查找名称完全匹配的实例,然后设置为新名称。
df.loc[df['Name'] == 'Mary', 'Name'] = "Amy"
Explanation:
When you apply it like this - you are working with strings, not with Pandas Series:
In [42]: df['Name'].apply(lambda x: print(type(x)))
<class 'str'> # <---- NOTE
<class 'str'> # <---- NOTE
<class 'str'> # <---- NOTE
Out[42]:
0 None
1 None
2 None
Name: Name, dtype: object
It's the same as:
In [44]: 'Marybeth'.replace('Mary','Amy')
Out[44]: 'Amybeth'
Solution:
Use Series.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None) properly (without Series.apply()
) - per default ( regex=False
) it will replace whole strings - as you expect it to work:
In [39]: df.Name.replace('Mary','Amy')
Out[39]:
0 Joe
1 Amy
2 Marybeth
Name: Name, dtype: object
you can explicitly specify regex=True
, this will replace substrings:
In [40]: df.Name.replace('Mary','Amy', regex=True)
Out[40]:
0 Joe
1 Amy
2 Amybeth
Name: Name, dtype: object
NOTE: Series.str.replace(pat, repl, n=-1, case=None, flags=0) doesn't have regex
parameter - it's always treats pat
and repl
as RegEx's:
In [41]: df.Name.str.replace('Mary','Amy')
Out[41]:
0 Joe
1 Amy
2 Amybeth
Name: Name, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.