[英]Drop rows based on group by of another column in pandas
Having a data set as below:有如下数据集:
I need to do the cartesian of product based on month and location.我需要根据月份和位置对产品进行笛卡尔运算。 Need an output as below:
需要一个 output 如下:
I created a new dataframe-with the unique values of product.我创建了一个新的数据框——具有产品的独特价值。 Then cross merged the df with dataset.need to drop the rows based on the month,location and product
然后将 df 与数据集交叉合并。需要根据月份、位置和产品删除行
You can try groupby
then cross
merge
on Product
column您可以尝试
groupby
然后在Product
列上cross
merge
out = (df.groupby(['Month', 'Location'])
.apply(lambda g: g[['Product']].merge(g[['Product']], how='cross'))
.droplevel(2)
.reset_index()
.rename(columns={'Product_x': 'Product', 'Product_y': 'Destination'}))
print(out)
Month Location Product Destination
0 17 Banglore A A
1 17 Banglore A B
2 17 Banglore B A
3 17 Banglore B B
4 18 Banglore C C
5 18 GOA D D
6 18 GOA D B
7 18 GOA B D
8 18 GOA B B
You can use itertools.product
:您可以使用
itertools.product
:
prods = [list(product(df[df.Month == month].location, df[df.Month == month].Product)) for month in df.Month.unique()]
Note that then I applied itertools.chain
to prods as to 'flatten' it as it is a nested list,请注意,然后我将
itertools.chain
应用于 prods 以“展平”它,因为它是一个嵌套列表,
prods = list(chain(*prods))
month_prod = [[m for i in range(len(df[df.Month == m]))] for m in df.Month]
months = list(chain(*month_prod))
df = pd.DataFrame({'Month': months, 'location': [item[0] for id, item in enumerate(prods)], 'Product': [item[1] for id, item in enumerate(prods)]} )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.