[英]Python pandas equivalent for replace
In R, there is a rather useful replace
function.在 R 中,有一个相当有用的replace
函数。 Essentially, it does conditional re-assignment in a given column of a data frame.本质上,它在数据框的给定列中进行有条件的重新分配。 It can be used as so: replace(df$column, df$column==1,'Type 1');
它可以这样使用: replace(df$column, df$column==1,'Type 1');
What is a good way to achieve the same in pandas?在熊猫中实现相同目标的好方法是什么?
Should I use a lambda with apply
?我应该将 lambda 与apply
一起apply
吗? (If so, how do I get a reference to the given column, as opposed to a whole row). (如果是这样,我如何获得对给定列的引用,而不是整行)。
Should I use np.where
on data_frame.values
?我应该用np.where
上data_frame.values
? It seems like I am missing a very obvious thing here.似乎我在这里遗漏了一件非常明显的事情。
Any suggestions are appreciated.任何建议表示赞赏。
pandas
has a replace
method too: pandas
也有一个replace
方法:
In [25]: df = DataFrame({1: [2,3,4], 2: [3,4,5]})
In [26]: df
Out[26]:
1 2
0 2 3
1 3 4
2 4 5
In [27]: df[2]
Out[27]:
0 3
1 4
2 5
Name: 2
In [28]: df[2].replace(4, 17)
Out[28]:
0 3
1 17
2 5
Name: 2
In [29]: df[2].replace(4, 17, inplace=True)
Out[29]:
0 3
1 17
2 5
Name: 2
In [30]: df
Out[30]:
1 2
0 2 3
1 3 17
2 4 5
or you could use numpy
-style advanced indexing:或者您可以使用numpy
样式的高级索引:
In [47]: df[1]
Out[47]:
0 2
1 3
2 4
Name: 1
In [48]: df[1] == 4
Out[48]:
0 False
1 False
2 True
Name: 1
In [49]: df[1][df[1] == 4]
Out[49]:
2 4
Name: 1
In [50]: df[1][df[1] == 4] = 19
In [51]: df
Out[51]:
1 2
0 2 3
1 3 17
2 19 5
Pandas doc for replace
does not have any examples, so I will give some here. Pandas doc for replace
没有任何例子,所以我在这里给出一些。 For those coming from an R perspective (like me), replace
is basically an all-purpose replacement function that combines the functionality of R functions plyr::mapvalues
, plyr::revalue
and stringr::str_replace_all
.对于那些从 R 角度来看的人(像我一样), replace
基本上是一个通用的替换函数,它结合了 R 函数plyr::mapvalues
、 plyr::revalue
stringr::str_replace_all
和stringr::str_replace_all
。 Since DSM covered the case of single values, I will cover the multi-value case.由于 DSM 涵盖了单值的情况,我将涵盖多值的情况。
Example series示例系列
In [10]: x = pd.Series([1, 2, 3, 4])
In [11]: x
Out[11]:
0 1
1 2
2 3
3 4
dtype: int64
We want to replace the positive integers with negative integers (and not by multiplying with -1).我们想用负整数替换正整数(而不是乘以 -1)。
Two lists of values两个值列表
One way to do this by having one list (or pandas series) of the values we want to replace and a second list with the values we want to replace them with.一种方法是使用我们想要替换的值的一个列表(或熊猫系列)和我们想要替换它们的值的第二个列表。
In [14]: x.replace([1, 2, 3, 4], [-1, -2, -3, -4])
Out[14]:
0 -1
1 -2
2 -3
3 -4
dtype: int64
This corresponds to plyr::mapvalues
.这对应于plyr::mapvalues
。
Dictionary of value pairs值对字典
Sometimes it's more convenient to have a dictionary of value pairs.有时,拥有值对的字典更方便。 The index is the one we replace and the value is the one we replace it with.索引是我们替换的那个,值是我们替换它的那个。
In [15]: x.replace({1: -1, 2: -2, 3: -3, 4: -4})
Out[15]:
0 -1
1 -2
2 -3
3 -4
dtype: int64
This corresponds to plyr::revalue
.这对应于plyr::revalue
。
Strings字符串
It works similarly for strings, except that we also have the option of using regex patterns.它对字符串的工作方式类似,除了我们还可以选择使用正则表达式模式。
If we simply want to replace strings with other strings, it works exactly the same as before:如果我们只是想用其他字符串替换字符串,它的工作原理与以前完全相同:
In [18]: s = pd.Series(["ape", "monkey", "seagull"])
In [22]: s
Out[22]:
0 ape
1 monkey
2 seagull
dtype: object
Two lists两个清单
In [25]: s.replace(["ape", "monkey"], ["lion", "panda"])
Out[25]:
0 lion
1 panda
2 seagull
dtype: object
Dictionary字典
In [26]: s.replace({"ape": "lion", "monkey": "panda"})
Out[26]:
0 lion
1 panda
2 seagull
dtype: object
Regex正则表达式
Replace all a
s with x
s.用x
s 替换所有a
s。
In [27]: s.replace("a", "x", regex=True)
Out[27]:
0 xpe
1 monkey
2 sexgull
dtype: object
Replace all l
s with x
s.用x
s 替换所有l
s。
In [28]: s.replace("l", "x", regex=True)
Out[28]:
0 ape
1 monkey
2 seaguxx
dtype: object
Note that both l
s in seagull
were replaced.请注意, seagull
中的两个l
都被替换了。
Replace a
s with x
s and l
s with p
s将a
s 替换a
x
s 并将l
s 替换为p
s
In [29]: s.replace(["a", "l"], ["x", "p"], regex=True)
Out[29]:
0 xpe
1 monkey
2 sexgupp
dtype: object
In the special case where one wants to replace multiple different values with the same value, one can just simply a single string as the replacement.在需要用相同值替换多个不同值的特殊情况下,可以只用一个字符串作为替换。 It must not be inside a list.它不能在列表中。 Replace a
s and l
s with p
s用p
s 替换a
s 和l
s
In [29]: s.replace(["a", "l"], "p", regex=True)
Out[29]:
0 ppe
1 monkey
2 sepgupp
dtype: object
(Credit to DaveL17 in the comments) (在评论中归功于 DaveL17)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.