简体   繁体   English

Pandas在数据帧的单个列上进行逻辑索引以分配值

[英]Pandas logical indexing on a single column of a dataframe to assign values

I am an R programmer and looking for a similar way to do something like this in R: 我是一个R程序员,并寻找类似的方式在R中做这样的事情:

data[data$x > value, y] <- 1

(basically, take all rows where the x column is greater than some value and assign the y column at those rows the value of 1) (基本上,获取x列大于某个值的所有行,并在这些行中为y列指定值1)

In pandas it would seem the equivalent would go something like: 在熊猫中,它似乎相当于:

data['y'][data['x'] > value] = 1

But this gives a SettingWithCopyWarning. 但是这给出了一个SettingWithCopyWarning。

Equivalent statements I've tried are: 我试过的等价陈述是:

condition = data['x']>value
data.loc(condition,'x')=1

But I'm seriously confused. 但我很困惑。 Maybe I'm thinking too much in R terms and can't wrap my head around what's going on in Python. 也许我在R方面思考太多,无法理解Python中正在发生的事情。 What would be equivalent code for this in Python, or workarounds? 在Python或变通方法中,这将是什么相同的代码?

Your statement is incorrect it should be: 你的陈述不正确它应该是:

data.loc[condition, 'x'] = 1

Example: 例:

In [3]:

df = pd.DataFrame({'a':np.random.randn(10)})
df
Out[3]:
          a
0 -0.063579
1 -1.039022
2 -0.011687
3  0.036160
4  0.195576
5 -0.921599
6  0.494899
7 -0.125701
8 -1.779029
9  1.216818
In [4]:

condition = df['a'] > 0
df.loc[condition, 'a'] = 20
df
Out[4]:
           a
0  -0.063579
1  -1.039022
2  -0.011687
3  20.000000
4  20.000000
5  -0.921599
6  20.000000
7  -0.125701
8  -1.779029

As you are subscripting the df you should use square brackets [] rather than parentheses () which is a function call. 当你订阅df时,你应该使用方括号[]而不是括号() ,这是一个函数调用。 See the docs 查看文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM