迭代行时，如何使用掩码更新DataFrame中的值

Question

With the below code I'm trying to update the column df_test['placed'] to = 1 when the if statement is triggered and a prediction is placed. 使用下面的代码我试图在触发if语句并放置预测时将列df_test['placed']为= 1。 I haven't been able to get this to update correctly though, the code compiles but doesn't update to = 1 for the respective predictions placed. 我无法正确更新，但代码编译但不会更新为= 1表示相应的预测。

df_test['placed'] = np.zeros(len(df_test))
for i in set(df_test['id']) :
    mask = df_test['id']==i
    predictions = lm.predict(X_test[mask])
    j = np.argmax(predictions)
    if predictions[j] > 0 :
        df_test['placed'][mask][j] = 1
        print(df_test['placed'][mask][j])

Answer 1

Answering your question 回答你的问题

Edit: changed suggestion based on comments 编辑：根据评论更改建议

The assignment part of your code, df_test['placed'][mask][j] = 1 , uses what is called chained indexing . 代码的赋值部分df_test['placed'][mask][j] = 1 ，使用所谓的链式索引。 In short, your assignment only changes a temporary copy of the DataFrame that gets immediately thrown away, and never changes the original DataFrame. 简而言之，您的作业只会更改立即丢弃的DataFrame的临时副本 ，并且永远不会更改原始DataFrame。

To avoid this, the rule of thumb when doing assignment is: use only one set of square braces on a single DataFrame. 为避免这种情况，执行赋值时的经验法则是：在单个DataFrame上仅使用一组方括号 。 For your problem, that should look like: 对于您的问题，这应该是这样的：

df_test.loc[mask.nonzero()[0][j], 'placed'] = 1

(I know the mask.nonzero() uses two sets of square brackets; actually nonzero() returns a tuple, and the first element of that tuple is an ndarray. But the dataframe only uses one set, and that's the important part.) （我知道mask.nonzero()使用两组方括号;实际上nonzero()返回一个元组，该元组的第一个元素是一个ndarray。但数据帧只使用一个集合，这是重要的部分。）

Some other notes 其他一些说明

There are a couple notes I have on using pandas (& numpy ). 我使用pandas （＆ numpy ）时有几个笔记。

Pandas & NumPy both have a feature called broadcasting . Pandas＆NumPy都有一个叫做广播的功能。 Basically, if you're assigning a single value to an entire array, you don't need to make an array of the same size first; 基本上，如果要为整个数组分配单个值，则不需要先创建相同大小的数组; you can just assign the single value, and pandas/NumPy automagically figures out for you how to apply it. 你可以只分配单个值，pandas / NumPy会自动为你找出如何应用它。 So the first line of your code can be replaced with df_test['placed'] = 0 , and it accomplishes the same thing. 所以你的代码的第一行可以用df_test['placed'] = 0代替，它完成同样的事情。
Generally speaking when working with pandas & numpy objects, loops are bad ; 一般来说 ，使用pandas和numpy对象时， 循环很糟糕 ; usually you can find a way to use some combination of broadcasting , element-wise operations and boolean indexing to do what a loop would do. 通常你可以找到一种方法来使用广播， 元素操作和布尔索引的某种组合来完成循环。 And because of the way those features are designed, it'll run a lot faster too. 而且由于这些功能的设计方式，它的运行速度也会快得多。 Unfortunately I'm not familiar enough with the lm.predict method to say, but you might be able to avoid the whole for -loop entirely for this code. 不幸的是，我对lm.predict方法不太熟悉，但你可能完全可以避免整个for -loop这个代码。

迭代行时，如何使用掩码更新DataFrame中的值

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-12-15 03:55:37

Answering your question 回答你的问题

Some other notes 其他一些说明

迭代行时，如何使用掩码更新DataFrame中的值

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-12-15 03:55:37

Answering your question 回答你的问题

Some other notes 其他一些说明

解决方案1
3 已采纳 2018-12-15 03:55:37