在没有 Pandas 的情况下对多列进行分组和求和

Question

I have a list that contains multiple columns, and I need to group and sum rows based on two columns.我有一个包含多列的列表，我需要根据两列对行进行分组和求和。 Can I do this without using a Pandas dataframe?我可以在不使用 Pandas dataframe 的情况下执行此操作吗？

I have a dataset in a list like this:我在这样的列表中有一个数据集：

User   Days  Project
Dave   3     Red
Dave   4     Red
Dave   2     Blue
Sue    4     Red
Sue    1     Red
Sue    3     Yellow

Specifically: [[Dave, 3, Red], [Dave, 4, Red], [Dave, 2, Blue], [Sue, 4, Red], [Sue, 1, Red], [Sue, 3, Yellow]]具体来说： [[Dave, 3, Red], [Dave, 4, Red], [Dave, 2, Blue], [Sue, 4, Red], [Sue, 1, Red], [Sue, 3, Yellow]]

What I want to do is output on the same line some totals like this:我想要做的是在同一行上的 output 一些总数是这样的：

User   Days  Project   UserDays  ProjectDaysPerUser
Dave   3     Red       9              7
Dave   4     Red       9              7
Dave   2     Blue      9              2
Sue    4     Red       8              5
Sue    1     Red       8              5
Sue    3     Yellow    8              3

So I'm trying to group twice to get the " ProjectDaysPerUser ", first by user, then by project.所以我试图分组两次以获得“ ProjectDaysPerUser ”，首先是用户，然后是项目。 It's this double grouping that's throwing me off.正是这种双重分组让我失望。

Is there an easy way to do this without creating a Panda dataframe?有没有一种简单的方法可以在不创建熊猫 dataframe 的情况下做到这一点？

Answer 1

Below script is using groupby and appending the result of the sum to the list.下面的脚本使用 groupby 并将总和的结果附加到列表中。

from itertools import groupby
data = [['Dave', 3, 'Red'], ['Dave', 4, 'Red'], ['Dave', 2, 'Blue'], ['Sue', 4, 'Red'], ['Sue', 1, 'Red'], ['Sue', 3, 'Yellow']]
new_data, final = [], []
userDays=[[k, sum(v[1] for v in g)] for k, g in groupby(data, key = lambda x: x[0])]
projuserDays=[[k, sum(v[1] for v in g)] for k, g in groupby(data, key = lambda x: (x[0], x[2]))]
#add userDays and projectuserdays
for d in data:
    for u in userDays:
        if d[0]==u[0]:
            d.append(u[1])
            new_data.append(d)
    for p in projuserDays:
        if d[0]==p[0][0] and d[2]==p[0][1]:
            d.append(p[1])
            final.append(d)
print(final)  

Result:
[['Dave', 3, 'Red', 9, 7],
 ['Dave', 4, 'Red', 9, 7],
 ['Dave', 2, 'Blue', 9, 2],
 ['Sue', 4, 'Red', 8, 5],
 ['Sue', 1, 'Red', 8, 5],
 ['Sue', 3, 'Yellow', 8, 3]]

Answer 2

use dictionary for improved performance使用字典提高性能

data = [['Dave', 3, 'Red'], ['Dave', 2, 'Blue'], ['Sue', 4, 'Red'], ['Dave', 4, 'Red'], ['Sue', 1, 'Red'], ['Sue', 3, 'Yellow']]
sum_dict = {}
for d in data:
    sum_dict[d[0]] = sum_dict.get(d[0], 0) + d[1]
    sum_dict[(d[0], d[2])] = sum_dict.get((d[0], d[2]), 0) + d[1]

for d in data:
    d.append(sum_dict[d[0]])
    d.append(sum_dict[(d[0], d[2])])
    print(d)

Answer 3

Because you're doing sums, this can also be solved nicely with collections.Counter :因为你在做总和，这也可以用collections.Counter很好地解决：

from collections import Counter

data = [['Dave', 3, 'Red'], ['Dave', 4, 'Red'], ['Dave', 2, 'Blue'], ['Sue', 4, 'Red'], ['Sue', 1, 'Red'], ['Sue', 3, 'Yellow']]


user_days = Counter()
project_user_days = Counter()

for (name, num_days, project) in data:
    user_days[name] += num_days
    project_user_days[(name, project)] += num_days

derived_data = [
    [name, num_days, project, user_days[name], project_user_days[(name, project)]]
    for (name, num_days, project) in data
]

import pprint
pprint.pprint(derived_data)

# [['Dave', 3, 'Red', 9, 7],
#  ['Dave', 4, 'Red', 9, 7],
#  ['Dave', 2, 'Blue', 9, 2],
#  ['Sue', 4, 'Red', 8, 5],
#  ['Sue', 1, 'Red', 8, 5],
#  ['Sue', 3, 'Yellow', 8, 3]]

Answer 4

Efficient Code高效代码

import itertools

def group_data(input1:list)->list:
    name_dict = {k : sum(v[1] for v in g) for k, g in itertools.groupby(sorted(input1, key=lambda x:x[0]), key=lambda x:x[0])}
    name_colour_dict = {k: sum(v[1] for v in g) for k,g in itertools.groupby(sorted(input1, key=lambda x:(x[0], x[2])), key=lambda x:(x[0],x[2]))}

    for row in input1:
        name = row[0]
        name_colour = (row[0], row[2])
        row.append(name_dict[name])
        row.append(name_colour_dict[name_colour])

    print(input1)

group_data([['Dave', 3, 'Red'], ['Dave', 4, 'Red'], ['Dave', 2, 'Blue'], ['Sue', 4, 'Red'], ['Sue', 1, 'Red'], ['Sue', 3, 'Yellow']]

) )

在没有 Pandas 的情况下对多列进行分组和求和

问题描述

4 个解决方案

解决方案1
2 已采纳 2019-06-17 21:01:33

解决方案2
1 2021-04-16 22:19:35

解决方案3
0 2019-06-17 22:52:35

解决方案4
0 2022-02-24 05:49:41

在没有 Pandas 的情况下对多列进行分组和求和

问题描述

4 个解决方案

解决方案1 2 已采纳 2019-06-17 21:01:33

解决方案2 1 2021-04-16 22:19:35

解决方案3 0 2019-06-17 22:52:35

解决方案4 0 2022-02-24 05:49:41

解决方案1
2 已采纳 2019-06-17 21:01:33

解决方案2
1 2021-04-16 22:19:35

解决方案3
0 2019-06-17 22:52:35

解决方案4
0 2022-02-24 05:49:41