[英]How to add a new column and fill it up with a specific value depending on another column's series?
I'm new to Pandas but thanks to Add column with constant value to pandas dataframe I was able to add different columns at once with我是 Pandas 的新手,但感谢Add column with constant value to pandas dataframe我能够一次添加不同的列
c = {'new1': 'w', 'new2': 'y', 'new3': 'z'}
df.assign(**c)
However I'm trying to figure out what's the path to take when I want to add a new column to a dataframe (currently 1.2 million rows * 23 columns).但是,我想弄清楚当我想向 dataframe(当前为 120 万行 * 23 列)添加新列时要采取什么路径。
Let's simplify the df a bit and try to make it more clear:让我们稍微简化一下 df 并尝试使其更清晰:
Order Orderline Product
1 0 Laptop
1 1 Bag
1 2 Mouse
2 0 Keyboard
3 0 Laptop
3 1 Mouse
I would like to add a new column where depending if the Order has at least 1 product == Bag then it should be 1 (for all rows for that specific order), otherwise 0.我想添加一个新列,这取决于订单是否至少有 1 个产品 == Bag,那么它应该是 1(对于该特定订单的所有行),否则为 0。
Result would become:结果将变为:
Order Orderline Product HasBag
1 0 Laptop 1
1 1 Bag 1
1 2 Mouse 1
2 0 Keyboard 0
3 0 Laptop 0
3 1 Mouse 0
What I could do is find all the unique order numbers, then filter out the subframe, check the Product column for Bag, if found then add 1 to a new column, otherwise 0, and then replace the original subframe with the result.我可以做的是找到所有唯一的订单号,然后过滤掉子框架,检查产品列的 Bag,如果找到,则在新列中添加 1,否则为 0,然后用结果替换原始子框架。
Likely there's a way better manner to accomplish this, and also way more performant.可能有更好的方式来实现这一点,而且性能也更高。
The main reason I'm trying to do this, is to flatten things down later on.我尝试这样做的主要原因是稍后将事情弄平。 Every order should become 1 line with some values of product.
每个订单都应成为具有某些产品值的 1 行。 I don't need the information for Bag anymore but I want to keep in my dataframe if the original order used to have a Bag (1) or no Bag (0).
我不再需要 Bag 的信息,但如果原始订单曾经有 Bag (1) 或没有 Bag (0),我想保留在我的 dataframe 中。
Ultimately when the data is cleaned out it can be used as a base for scikit-learn (or that's what I hope).最终,当数据被清除时,它可以用作 scikit-learn 的基础(或者这就是我希望的)。
If I understand you correctly, you want GroupBy.transform.any
如果我理解正确,你想要
GroupBy.transform.any
First we create a boolean array by checking which rows in Product
are Bag
with Series.eq
.首先,我们通过检查
Product
中的哪些行是带有Series.eq
的Bag
来创建一个 boolean 数组。 Then we GroupBy
on this boolean array and check if any
of the values are True
.然后我们对这个 boolean 数组进行
GroupBy
并检查是否any
值是True
。 We use transform
to keep the shape of our initial array so we can assign the values back.我们使用
transform
来保持初始数组的形状,以便我们可以将值分配回去。
df['ind'] = df['Product'].eq('Bag').groupby(df['Order']).transform('any').astype(int)
Order Orderline Product ind
0 1 0 Laptop 1
1 1 1 Bag 1
2 1 2 Mouse 1
3 2 0 Keyboard 0
4 3 0 Laptop 0
5 3 1 Mouse 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.