简体   繁体   English

如何计算某一组内的组合?

[英]How to count combinations within a certain group?

I have data of people logging time to certain projects on certain dates. 我有人在某些日期记录某些项目时间的数据。 So my table will look something like this: 所以我的表格看起来像这样:

ProjectID Date   memberID hours
project1  01.05  a        2
project1  01.05  b        5
project2  05.05  a        1
project2  05.05  b        2
project2  05.05  c        3
project3  07.06  a        4
project3  07.06  b        1
project3  07.06  c        2

etc. 等等

What I now want to do is to count for each project, for each combination of project members of that project, how much time they have worked on a project together in the past. 我现在想要做的是计算每个项目,对于该项目的项目成员的每个组合,他们在过去共同工作了多少时间。 If they both have worked on a project together, it should count the minimum of hours. 如果他们都在一起工作,那么应该计算最少的小时数。 Eg if member 1 worked 1 hour on the project and member 2 for 2 hours, it should count only 1 hour because the second hour, they cant have worked together. 例如,如果成员1在项目上工作1小时而成员2工作2小时,那么它应该只计算1小时,因为第二个小时,他们不能一起工作。

Eg 例如

ProjectID Date   memberID1 memberID2 hoursworkedtogether
project1   01.05  a         b         0
project2   05.05  a         b         2
project2   05.05  a         c         0
project2   05.05  b         c         0
project3   07.06  a         b         3
project3   07.06  b         c         2
project3   07.06  a         c         1

I've tried aggregating using pivot tables but that did not work as two project members will always be in different rows in the raw data and the pivot won't count combinations of values within the same row it seems. 我已经尝试使用数据透视表进行聚合,但这不起作用,因为两个项目成员将始终位于原始数据中的不同行中,并且数据透视图不会计算它看起来在同一行中的值组合。

One approach would be to write a simple loop and loop over all projects but I feel like there should be a more efficient option, as the table is quite large. 一种方法是在所有项目上编写一个简单的循环和循环,但我觉得应该有一个更有效的选项,因为表非常大。

I am not sure, if this is the fastest solution, but pandas.apply() with list comprehensions have to be kind of fast... ;-) 我不确定,如果这是最快的解决方案,但是带有列表pandas.apply()必须有点快...... ;-)

Group you data by ProjectID and Date and use itertools.combinations() to create all combinations of users per project. ProjectIDDate数据进行分组,并使用itertools.combinations()创建每个项目的所有用户组合。

import pandas as pd
df = pd.DataFrame([['project1', '01.05', 'a', 2],
        ['project1', '01.05', 'b', 5],
        ['project2', '05.05', 'a', 1],
        ['project2', '05.05', 'b', 2],
        ['project2', '05.05', 'c', 3],
        ['project3', '07.06', 'a', 4],
        ['project3', '07.06', 'b', 1],
        ['project3', '07.06', 'c', 2]],
        columns=['ProjectID', 'Date', 'memberID', 'hours'])
from itertools import combinations
def calc_member_hours(project):
    data = [(x[0], 
             x[1], 
             *min(project['hours'][project['memberID']==x[0]].values,project['hours'][project['memberID']==x[1]].values)) 
                for x in list(combinations(project['memberID'],2))]
    df = pd.DataFrame(data, columns=['memberID1', 'memberID2', 'hoursworkedtogether'])
    return df

result_df = df.groupby(['ProjectID', 'Date']).apply(calc_member_hours)
result_df

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM