如何添加新列并根据另一列的系列填充特定值？

Question

I'm new to Pandas but thanks to Add column with constant value to pandas dataframe I was able to add different columns at once with我是 Pandas 的新手，但感谢Add column with constant value to pandas dataframe我能够一次添加不同的列

c = {'new1': 'w', 'new2': 'y', 'new3': 'z'}
df.assign(**c)

However I'm trying to figure out what's the path to take when I want to add a new column to a dataframe (currently 1.2 million rows * 23 columns).但是，我想弄清楚当我想向 dataframe（当前为 120 万行 * 23 列）添加新列时要采取什么路径。

Let's simplify the df a bit and try to make it more clear:让我们稍微简化一下 df 并尝试使其更清晰：

Order   Orderline   Product  
1       0           Laptop  
1       1           Bag  
1       2           Mouse  
2       0           Keyboard  
3       0           Laptop  
3       1           Mouse

I would like to add a new column where depending if the Order has at least 1 product == Bag then it should be 1 (for all rows for that specific order), otherwise 0.我想添加一个新列，这取决于订单是否至少有 1 个产品 == Bag，那么它应该是 1（对于该特定订单的所有行），否则为 0。

Result would become:结果将变为：

Order   Orderline   Product   HasBag  
1       0           Laptop    1  
1       1           Bag       1  
1       2           Mouse     1  
2       0           Keyboard  0  
3       0           Laptop    0  
3       1           Mouse     0

What I could do is find all the unique order numbers, then filter out the subframe, check the Product column for Bag, if found then add 1 to a new column, otherwise 0, and then replace the original subframe with the result.我可以做的是找到所有唯一的订单号，然后过滤掉子框架，检查产品列的 Bag，如果找到，则在新列中添加 1，否则为 0，然后用结果替换原始子框架。

Likely there's a way better manner to accomplish this, and also way more performant.可能有更好的方式来实现这一点，而且性能也更高。

The main reason I'm trying to do this, is to flatten things down later on.我尝试这样做的主要原因是稍后将事情弄平。 Every order should become 1 line with some values of product.每个订单都应成为具有某些产品值的 1 行。 I don't need the information for Bag anymore but I want to keep in my dataframe if the original order used to have a Bag (1) or no Bag (0).我不再需要 Bag 的信息，但如果原始订单曾经有 Bag (1) 或没有 Bag (0)，我想保留在我的 dataframe 中。

Ultimately when the data is cleaned out it can be used as a base for scikit-learn (or that's what I hope).最终，当数据被清除时，它可以用作 scikit-learn 的基础（或者这就是我希望的）。

Answer 1

If I understand you correctly, you want GroupBy.transform.any如果我理解正确，你想要GroupBy.transform.any

First we create a boolean array by checking which rows in Product are Bag with Series.eq .首先，我们通过检查Product中的哪些行是带有Series.eq的Bag来创建一个 boolean 数组。 Then we GroupBy on this boolean array and check if any of the values are True .然后我们对这个 boolean 数组进行GroupBy并检查是否any值是True 。 We use transform to keep the shape of our initial array so we can assign the values back.我们使用transform来保持初始数组的形状，以便我们可以将值分配回去。

df['ind'] = df['Product'].eq('Bag').groupby(df['Order']).transform('any').astype(int)

   Order  Orderline   Product  ind
0      1          0    Laptop    1
1      1          1       Bag    1
2      1          2     Mouse    1
3      2          0  Keyboard    0
4      3          0    Laptop    0
5      3          1     Mouse    0

如何添加新列并根据另一列的系列填充特定值？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-04-21 20:02:28

如何添加新列并根据另一列的系列填充特定值？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-04-21 20:02:28

解决方案1
2 已采纳 2020-04-21 20:02:28