I have data of people logging time to certain projects on certain dates. So my table will look something like this:
ProjectID Date memberID hours
project1 01.05 a 2
project1 01.05 b 5
project2 05.05 a 1
project2 05.05 b 2
project2 05.05 c 3
project3 07.06 a 4
project3 07.06 b 1
project3 07.06 c 2
etc.
What I now want to do is to count for each project, for each combination of project members of that project, how much time they have worked on a project together in the past. If they both have worked on a project together, it should count the minimum of hours. Eg if member 1 worked 1 hour on the project and member 2 for 2 hours, it should count only 1 hour because the second hour, they cant have worked together.
Eg
ProjectID Date memberID1 memberID2 hoursworkedtogether
project1 01.05 a b 0
project2 05.05 a b 2
project2 05.05 a c 0
project2 05.05 b c 0
project3 07.06 a b 3
project3 07.06 b c 2
project3 07.06 a c 1
I've tried aggregating using pivot tables but that did not work as two project members will always be in different rows in the raw data and the pivot won't count combinations of values within the same row it seems.
One approach would be to write a simple loop and loop over all projects but I feel like there should be a more efficient option, as the table is quite large.
I am not sure, if this is the fastest solution, but pandas.apply()
with list comprehensions have to be kind of fast... ;-)
Group you data by ProjectID
and Date
and use itertools.combinations()
to create all combinations of users per project.
import pandas as pd
df = pd.DataFrame([['project1', '01.05', 'a', 2],
['project1', '01.05', 'b', 5],
['project2', '05.05', 'a', 1],
['project2', '05.05', 'b', 2],
['project2', '05.05', 'c', 3],
['project3', '07.06', 'a', 4],
['project3', '07.06', 'b', 1],
['project3', '07.06', 'c', 2]],
columns=['ProjectID', 'Date', 'memberID', 'hours'])
from itertools import combinations
def calc_member_hours(project):
data = [(x[0],
x[1],
*min(project['hours'][project['memberID']==x[0]].values,project['hours'][project['memberID']==x[1]].values))
for x in list(combinations(project['memberID'],2))]
df = pd.DataFrame(data, columns=['memberID1', 'memberID2', 'hoursworkedtogether'])
return df
result_df = df.groupby(['ProjectID', 'Date']).apply(calc_member_hours)
result_df
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.