简体   繁体   English

制作一个 zip() pandas 列来总结具有唯一索引的其他列

[英]Make a zip() pandas columns to sum up other columns with unique index

I have a DataFrame with 3 columns:我有一个包含 3 列的 DataFrame:

  • store店铺
  • product产品
  • price价格

For each store we have multiple products, but each product has a unique price.对于每个商店,我们有多种产品,但每种产品都有唯一的价格。 The DataFrame is hence composed of multiple rows on the same store, each row corresponding to a product.因此,DataFrame 由同一商店的多行组成,每行对应一个产品。

I would like to make some transformations on the dataset to get only one line per store, and a compound column that would sum up info about products and prices as follow:我想对数据集进行一些转换,以便每家商店只获得一行,以及一个复合列来汇总有关产品和价格的信息,如下所示:

[(product_1,price_1),(product_2,price_2), ...]

For now I've not been able to do it.现在我还做不到。

What I have done is that I've grouped by store , aggregated by product, and applied the .unique() function. I get for each store, a list of all the products, but not the prices.我所做的是按store分组,按产品汇总,并应用.unique() function。我为每个商店获取所有产品的列表,但不是价格。 When I try to add price to the .agg() function followed by .unique() it doesn't work and have no clue how to do this.当我尝试将price添加到.agg() function 后跟.unique()它不起作用并且不知道如何执行此操作。

I guess I might have to apply some zipping at some point: zip(product, price) but I don't get until there.我想我可能不得不在某个时候应用一些压缩: zip(product, price)但直到那里我才明白。

Any help is appreciated, thanks!任何帮助表示赞赏,谢谢!

df.groupby("store", as_index = False).apply(lambda x: pd.Series({'store': x["store"].iloc[0],
                                                                "result": [(val["product"], val["price"]) for idx, val in x.iterrows()]}))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM