Pandas在数据帧的单个列上进行逻辑索引以分配值

Question

I am an R programmer and looking for a similar way to do something like this in R: 我是一个R程序员，并寻找类似的方式在R中做这样的事情：

data[data$x > value, y] <- 1

(basically, take all rows where the x column is greater than some value and assign the y column at those rows the value of 1) （基本上，获取x列大于某个值的所有行，并在这些行中为y列指定值1）

In pandas it would seem the equivalent would go something like: 在熊猫中，它似乎相当于：

data['y'][data['x'] > value] = 1

But this gives a SettingWithCopyWarning. 但是这给出了一个SettingWithCopyWarning。

Equivalent statements I've tried are: 我试过的等价陈述是：

condition = data['x']>value
data.loc(condition,'x')=1

But I'm seriously confused. 但我很困惑。 Maybe I'm thinking too much in R terms and can't wrap my head around what's going on in Python. 也许我在R方面思考太多，无法理解Python中正在发生的事情。 What would be equivalent code for this in Python, or workarounds? 在Python或变通方法中，这将是什么相同的代码？

Answer 1

Your statement is incorrect it should be: 你的陈述不正确它应该是：

data.loc[condition, 'x'] = 1

Example: 例：

In [3]:

df = pd.DataFrame({'a':np.random.randn(10)})
df
Out[3]:
          a
0 -0.063579
1 -1.039022
2 -0.011687
3  0.036160
4  0.195576
5 -0.921599
6  0.494899
7 -0.125701
8 -1.779029
9  1.216818
In [4]:

condition = df['a'] > 0
df.loc[condition, 'a'] = 20
df
Out[4]:
           a
0  -0.063579
1  -1.039022
2  -0.011687
3  20.000000
4  20.000000
5  -0.921599
6  20.000000
7  -0.125701
8  -1.779029

As you are subscripting the df you should use square brackets [] rather than parentheses () which is a function call. 当你订阅df时，你应该使用方括号[]而不是括号() ，这是一个函数调用。 See the docs 查看文档

Pandas在数据帧的单个列上进行逻辑索引以分配值

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-06-18 20:20:18

Pandas在数据帧的单个列上进行逻辑索引以分配值

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-06-18 20:20:18

解决方案1
1 已采纳 2015-06-18 20:20:18