Lets say I have the following data set, turned into a dataframe:
data = [
['Job 1', datetime.date(2019, 6, 9), 'Jim', 'Tom'],
['Job 1', datetime.date(2019, 6, 9), 'Bill', 'Tom'],
['Job 1', datetime.date(2019, 6, 9), 'Tom', 'Tom'],
['Job 1', datetime.date(2019, 6, 10), 'Bill', None],
['Job 2', datetime.date(2019,6,10), 'Tom', 'Tom']
]
df = pd.DataFrame(data, columns=['Job', 'Date', 'Employee', 'Manager'])
This yields a dataframe that looks like:
Job Date Employee Manager
0 Job 1 2019-06-09 Jim Tom
1 Job 1 2019-06-09 Bill Tom
2 Job 1 2019-06-09 Tom Tom
3 Job 1 2019-06-10 Bill None
4 Job 2 2019-06-10 Tom Tom
What I am trying to generate is a pivot on each unique Job/Date combo, with a column for Manager, and a column for a string with comma separated, non-manager employees. A couple of things to assume:
I'd like the resulting dataframe to look like:
Job Date Manager Employees
0 Job 1 2019-06-09 Tom Jim, Bill
1 Job 1 2019-06-10 None Bill
2 Job 2 2019-06-10 Tom None
Which leads to my questions:
I suspect 1) is possible, and 2) might be more difficult. If 2) is a no, I can get around it in other ways later in my code.
The tricky part here is removing the Manager from the Employee column.
u = df.melt(['Job', 'Date'])
f = u[~u.duplicated(['Job', 'Date', 'value'], keep='last')].astype(str)
f.pivot_table(
index=['Job', 'Date'],
columns='variable', values='value',
aggfunc=','.join
).rename_axis(None, axis=1)
Employee Manager
Job Date
Job 1 2019-06-09 Jim,Bill Tom
2019-06-10 Bill None
Job 2 2019-06-10 NaN Tom
Group to aggregate, then fix the Employees by removing the Manager and setting to None where appropriate. Since the employees are unique, sets will work nicely here to remove the Manager.
s = df.groupby(['Job', 'Date']).agg({'Manager': 'first', 'Employee': lambda x: set(x)})
s['Employee'] = [', '.join(x.difference({y})) for x,y in zip(s.Employee, s.Manager)]
s['Employee'] = s.Employee.replace({'': None})
Manager Employee
Job Date
Job 1 2019-06-09 Tom Jim, Bill
2019-06-10 None Bill
Job 2 2019-06-10 Tom None
I'm partial to building a dictionary up with the desired results and reconstructing the dataframe.
d = {}
for t in df.itertuples():
d_ = d.setdefault((t.Job, t.Date), {})
d_['Manager'] = t.Manager
d_.setdefault('Employees', set()).add(t.Employee)
for k, v in d.items():
v['Employees'] -= {v['Manager']}
v['Employees'] = ', '.join(v['Employees'])
pd.DataFrame(d.values(), d).rename_axis(['Job', 'Date']).reset_index()
Job Date Employees Manager
0 Job 1 2019-06-09 Bill, Jim Tom
1 Job 1 2019-06-10 Bill None
2 Job 2 2019-06-10 Tom
In your case try not using lambda transform
+ drop_duplicates
df['Employee']=df['Employee'].mask(df['Employee'].eq(df.Manager)).dropna().groupby([df['Job'], df['Date']]).transform('unique').str.join(',')
df=df.drop_duplicates(['Job','Date'])
df
Out[745]:
Job Date Employee Manager
0 Job 1 2019-06-09 Jim,Bill Tom
3 Job 1 2019-06-10 Bill None
4 Job 2 2019-06-10 NaN Tom
how about
df.groupby(["Job","Date","Manager"]).apply( lambda x: ",".join(x.Employee))
this will find all unique sets of Job Date and Manager and put the employees together with "," into one string
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.