简体   繁体   English

基于 pandas 中另一列的 group by 删除行

[英]Drop rows based on group by of another column in pandas

Having a data set as below:有如下数据集:

图像

I need to do the cartesian of product based on month and location.我需要根据月份和位置对产品进行笛卡尔运算。 Need an output as below:需要一个 output 如下:

图像2

I created a new dataframe-with the unique values of product.我创建了一个新的数据框——具有产品的独特价值。 Then cross merged the df with dataset.need to drop the rows based on the month,location and product然后将 df 与数据集交叉合并。需要根据月份、位置和产品删除行

图像3

You can try groupby then cross merge on Product column您可以尝试groupby然后在Product列上cross merge

out = (df.groupby(['Month', 'Location'])
       .apply(lambda g: g[['Product']].merge(g[['Product']], how='cross'))
       .droplevel(2)
       .reset_index()
       .rename(columns={'Product_x': 'Product', 'Product_y': 'Destination'}))
print(out)

   Month  Location Product Destination
0     17  Banglore       A           A
1     17  Banglore       A           B
2     17  Banglore       B           A
3     17  Banglore       B           B
4     18  Banglore       C           C
5     18       GOA       D           D
6     18       GOA       D           B
7     18       GOA       B           D
8     18       GOA       B           B

You can use itertools.product :您可以使用itertools.product

prods = [list(product(df[df.Month == month].location, df[df.Month == month].Product)) for month in df.Month.unique()]

Note that then I applied itertools.chain to prods as to 'flatten' it as it is a nested list,请注意,然后我将itertools.chain应用于 prods 以“展平”它,因为它是一个嵌套列表,

prods = list(chain(*prods))

month_prod = [[m for i in range(len(df[df.Month == m]))] for m in df.Month]
months = list(chain(*month_prod))

df = pd.DataFrame({'Month': months, 'location': [item[0] for id, item in enumerate(prods)], 'Product': [item[1] for id, item in enumerate(prods)]} )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM