简体   繁体   English

如何添加新列并根据另一列的系列填充特定值?

[英]How to add a new column and fill it up with a specific value depending on another column's series?

I'm new to Pandas but thanks to Add column with constant value to pandas dataframe I was able to add different columns at once with我是 Pandas 的新手,但感谢Add column with constant value to pandas dataframe我能够一次添加不同的列

c = {'new1': 'w', 'new2': 'y', 'new3': 'z'}
df.assign(**c)

However I'm trying to figure out what's the path to take when I want to add a new column to a dataframe (currently 1.2 million rows * 23 columns).但是,我想弄清楚当我想向 dataframe(当前为 120 万行 * 23 列)添加新列时要采取什么路径。

Let's simplify the df a bit and try to make it more clear:让我们稍微简化一下 df 并尝试使其更清晰:

Order   Orderline   Product  
1       0           Laptop  
1       1           Bag  
1       2           Mouse  
2       0           Keyboard  
3       0           Laptop  
3       1           Mouse  

I would like to add a new column where depending if the Order has at least 1 product == Bag then it should be 1 (for all rows for that specific order), otherwise 0.我想添加一个新列,这取决于订单是否至少有 1 个产品 == Bag,那么它应该是 1(对于该特定订单的所有行),否则为 0。

Result would become:结果将变为:

Order   Orderline   Product   HasBag  
1       0           Laptop    1  
1       1           Bag       1  
1       2           Mouse     1  
2       0           Keyboard  0  
3       0           Laptop    0  
3       1           Mouse     0  

What I could do is find all the unique order numbers, then filter out the subframe, check the Product column for Bag, if found then add 1 to a new column, otherwise 0, and then replace the original subframe with the result.我可以做的是找到所有唯一的订单号,然后过滤掉子框架,检查产品列的 Bag,如果找到,则在新列中添加 1,否则为 0,然后用结果替换原始子框架。

Likely there's a way better manner to accomplish this, and also way more performant.可能有更好的方式来实现这一点,而且性能也更高。

The main reason I'm trying to do this, is to flatten things down later on.我尝试这样做的主要原因是稍后将事情弄平。 Every order should become 1 line with some values of product.每个订单都应成为具有某些产品值的 1 行。 I don't need the information for Bag anymore but I want to keep in my dataframe if the original order used to have a Bag (1) or no Bag (0).我不再需要 Bag 的信息,但如果原始订单曾经有 Bag (1) 或没有 Bag (0),我想保留在我的 dataframe 中。

Ultimately when the data is cleaned out it can be used as a base for scikit-learn (or that's what I hope).最终,当数据被清除时,它可以用作 scikit-learn 的基础(或者这就是我希望的)。

If I understand you correctly, you want GroupBy.transform.any如果我理解正确,你想要GroupBy.transform.any

First we create a boolean array by checking which rows in Product are Bag with Series.eq .首先,我们通过检查Product中的哪些行是带有Series.eqBag来创建一个 boolean 数组。 Then we GroupBy on this boolean array and check if any of the values are True .然后我们对这个 boolean 数组进行GroupBy并检查是否any值是True We use transform to keep the shape of our initial array so we can assign the values back.我们使用transform来保持初始数组的形状,以便我们可以将值分配回去。

df['ind'] = df['Product'].eq('Bag').groupby(df['Order']).transform('any').astype(int)

   Order  Orderline   Product  ind
0      1          0    Laptop    1
1      1          1       Bag    1
2      1          2     Mouse    1
3      2          0  Keyboard    0
4      3          0    Laptop    0
5      3          1     Mouse    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将列添加到 CSV 并使用终端或 Python 根据特定值填充它 - How to add a column into a CSV and fill it depending on a specific value using terminal or Python 如何转发/填充 Pandas DataFrame 列/系列中的特定值? - How to forward propagate/fill a specific value in a Pandas DataFrame Column/Series? Pandas:创建新列并根据字符串列中的值(子字符串)和另一列上的值添加值 - Pandas: Create new column and add value depending on value (substring) in a string column and value on another column 根据 pandas 中另一个值向新列添加值 - Add value to new column depending on values in another in pandas 使用来自另一个 dataframe 的列的值填充列,具体取决于条件 - fill column with value of a column from another dataframe, depending on conditions 如何在特定列中填写值? - How to fill value in specific column ? 根据值添加具有特定数字序列的列 - Add column with a specific sequence of numbers depending on value 用与另一列匹配的值填充新列 - Fill new column with value where column matches another column Pandas 根据 GroupBy 值在另一列中填充值 - Pandas fill value in another column depending upon GroupBy values Python 添加一个新列并填充以另一列为条件的值 - Python add a new column and fill with values conditional on another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM