具有多个值的分组列

Question

I have a dataframe that looks like this one (one column has multiple values, the other are just numbers with decimals):我有一个 dataframe 看起来像这样（一列有多个值，另一列只是带小数的数字）：

food number
apple,tomato,melon 897.0
apple,meat,banana 984.9
banana,tomato 340.8

I want to get the average number of every food.我想得到每种食物的平均数量。 In the example that'll be:在示例中将是：

apple = (897.0 + 984.9)/2 = 940.95苹果 = (897.0 + 984.9)/2 = 940.95
banana = (984.9+340.8)/2 = 662.85香蕉 = (984.9+340.8)/2 = 662.85

And so on to the point of ending up with a new dataframe with just the foods and the average number.依此类推，最终得到一个新的 dataframe，只有食物和平均数量。

food average
apple 915.95
banana 662.85

I tried my luck with groupby, but the result is all messed up:我用 groupby 试试运气，但结果一团糟：

#reshape data
df = pd.DataFrame({
    'food' : list(chain.from_iterable(df.food.tolist())), 
    'number' : df.number.repeat(df.food.str.len())
})
# groupby
df.groupby('food').number.apply(lambda x: x.unique().tolist())

I must say that the original dataframe has over 100k rows.我必须说原来的 dataframe 有超过 10 万行。 Thanks.谢谢。

Answer 1

Use DataFrame.explode(<column-name>) to expand the individual items in the lists into separate cells.使用DataFrame.explode(<column-name>)将列表中的各个项目展开到单独的单元格中。 They keep the original index, so the corresponding number gets filled in. From there, it's an easy group by, followed by a simple mean.他们保留原始索引，因此填写相应的数字。从那里，这是一个简单的分组，然后是一个简单的平均值。

import pandas as pd

df = pd.DataFrame({'food': [['apple', 'tomato', 'melon'], 
                            ['apple','meat', 'banana'],
                            ['banana', 'tomato']], 
                   'number': [897, 984.9, 340.8]})

df.explode('food').groupby('food').mean()

results in结果是

        number
food          
apple   940.95
banana  662.85
meat    984.90
melon   897.00
tomato  618.90

Answer 2

First you will have to convert the string column to a list in each cell.首先，您必须将字符串列转换为每个单元格中的列表。 I've also included the ability to remove white spaces if any.我还包括删除空格（如果有）的功能。 I modify from the df created by @9769953我从 @9769953 创建的 df 修改

import pandas as pd
df = pd.DataFrame({'food': ["apple,tomato, melon", 
                            "apple,meat,banana,melon",
                            "banana, tomato, melon"], 
                   'number': [897, 984.9, 340.8]})

df['food'] = df['food'].str.split(',').apply(lambda x: [e.strip() for e in x]).tolist()
df.explode('food').groupby('food').agg('mean')

Output Output

If you would like more aggregations, you could use如果您想要更多聚合，可以使用

df.explode('food').groupby('food').agg(['min', 'mean', 'max'])

具有多个值的分组列

问题描述

2 个解决方案

解决方案1
1 2022-05-27 21:24:28

解决方案2
0 已采纳 2022-05-27 21:54:21

具有多个值的分组列

问题描述

2 个解决方案

解决方案1 1 2022-05-27 21:24:28

解决方案2 0 已采纳 2022-05-27 21:54:21

解决方案1
1 2022-05-27 21:24:28

解决方案2
0 已采纳 2022-05-27 21:54:21