简体   繁体   English

相当于替换的 Python 熊猫

[英]Python pandas equivalent for replace

In R, there is a rather useful replace function.在 R 中,有一个相当有用的replace函数。 Essentially, it does conditional re-assignment in a given column of a data frame.本质上,它在数据框的给定列中进行有条件的重新分配。 It can be used as so: replace(df$column, df$column==1,'Type 1');它可以这样使用: replace(df$column, df$column==1,'Type 1');

What is a good way to achieve the same in pandas?在熊猫中实现相同目标的好方法是什么?

Should I use a lambda with apply ?我应该将 lambda 与apply一起apply吗? (If so, how do I get a reference to the given column, as opposed to a whole row). (如果是这样,我如何获得对给定列的引用,而不是整行)。

Should I use np.where on data_frame.values ?我应该用np.wheredata_frame.values It seems like I am missing a very obvious thing here.似乎我在这里遗漏了一件非常明显的事情。

Any suggestions are appreciated.任何建议表示赞赏。

pandas has a replace method too: pandas也有一个replace方法:

In [25]: df = DataFrame({1: [2,3,4], 2: [3,4,5]})

In [26]: df
Out[26]: 
   1  2
0  2  3
1  3  4
2  4  5

In [27]: df[2]
Out[27]: 
0    3
1    4
2    5
Name: 2

In [28]: df[2].replace(4, 17)
Out[28]: 
0     3
1    17
2     5
Name: 2

In [29]: df[2].replace(4, 17, inplace=True)
Out[29]: 
0     3
1    17
2     5
Name: 2

In [30]: df
Out[30]: 
   1   2
0  2   3
1  3  17
2  4   5

or you could use numpy -style advanced indexing:或者您可以使用numpy样式的高级索引:

In [47]: df[1]
Out[47]: 
0    2
1    3
2    4
Name: 1

In [48]: df[1] == 4
Out[48]: 
0    False
1    False
2     True
Name: 1

In [49]: df[1][df[1] == 4]
Out[49]: 
2    4
Name: 1

In [50]: df[1][df[1] == 4] = 19

In [51]: df
Out[51]: 
    1   2
0   2   3
1   3  17
2  19   5

Pandas doc for replace does not have any examples, so I will give some here. Pandas doc for replace没有任何例子,所以我在这里给出一些。 For those coming from an R perspective (like me), replace is basically an all-purpose replacement function that combines the functionality of R functions plyr::mapvalues , plyr::revalue and stringr::str_replace_all .对于那些从 R 角度来看的人(像我一样), replace基本上是一个通用的替换函数,它结合了 R 函数plyr::mapvaluesplyr::revalue stringr::str_replace_allstringr::str_replace_all Since DSM covered the case of single values, I will cover the multi-value case.由于 DSM 涵盖了单值的情况,我将涵盖多值的情况。

Example series示例系列

In [10]: x = pd.Series([1, 2, 3, 4])

In [11]: x
Out[11]: 
0    1
1    2
2    3
3    4
dtype: int64

We want to replace the positive integers with negative integers (and not by multiplying with -1).我们想用负整数替换正整数(而不是乘以 -1)。

Two lists of values两个值列表

One way to do this by having one list (or pandas series) of the values we want to replace and a second list with the values we want to replace them with.一种方法是使用我们想要替换的值的一个列表(或熊猫系列)和我们想要替换它们的值的第二个列表。

In [14]: x.replace([1, 2, 3, 4], [-1, -2, -3, -4])
Out[14]: 
0   -1
1   -2
2   -3
3   -4
dtype: int64

This corresponds to plyr::mapvalues .这对应于plyr::mapvalues

Dictionary of value pairs值对字典

Sometimes it's more convenient to have a dictionary of value pairs.有时,拥有值对的字典更方便。 The index is the one we replace and the value is the one we replace it with.索引是我们替换的那个,值是我们替换它的那个。

In [15]: x.replace({1: -1, 2: -2, 3: -3, 4: -4})
Out[15]: 
0   -1
1   -2
2   -3
3   -4
dtype: int64

This corresponds to plyr::revalue .这对应于plyr::revalue

Strings字符串

It works similarly for strings, except that we also have the option of using regex patterns.它对字符串的工作方式类似,除了我们还可以选择使用正则表达式模式。

If we simply want to replace strings with other strings, it works exactly the same as before:如果我们只是想用其他字符串替换字符串,它的工作原理与以前完全相同:

In [18]: s = pd.Series(["ape", "monkey", "seagull"])
In [22]: s
Out[22]: 
0        ape
1     monkey
2    seagull
dtype: object

Two lists两个清单

In [25]: s.replace(["ape", "monkey"], ["lion", "panda"])
Out[25]: 
0       lion
1      panda
2    seagull
dtype: object

Dictionary字典

In [26]: s.replace({"ape": "lion", "monkey": "panda"})
Out[26]: 
0       lion
1      panda
2    seagull
dtype: object

Regex正则表达式

Replace all a s with x s.x s 替换所有a s。

In [27]: s.replace("a", "x", regex=True)
Out[27]: 
0        xpe
1     monkey
2    sexgull
dtype: object

Replace all l s with x s.x s 替换所有l s。

In [28]: s.replace("l", "x", regex=True)
Out[28]: 
0        ape
1     monkey
2    seaguxx
dtype: object

Note that both l s in seagull were replaced.请注意, seagull中的两个l都被替换了。

Replace a s with x s and l s with p sa s 替换a x s 并将l s 替换为p s

In [29]: s.replace(["a", "l"], ["x", "p"], regex=True)
Out[29]: 
0        xpe
1     monkey
2    sexgupp
dtype: object

In the special case where one wants to replace multiple different values with the same value, one can just simply a single string as the replacement.在需要用相同值替换多个不同值的特殊情况下,可以只用一个字符串作为替换。 It must not be inside a list.它不能在列表中。 Replace a s and l s with p sp s 替换a s 和l s

In [29]: s.replace(["a", "l"], "p", regex=True)
Out[29]: 
0        ppe
1     monkey
2    sepgupp
dtype: object

(Credit to DaveL17 in the comments) (在评论中归功于 DaveL17)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM