[英]Pandas- pivoting column into (conditional) aggregated string
Lets say I have the following data set, turned into a dataframe: 假设我有以下数据集,变成了数据帧:
data = [
['Job 1', datetime.date(2019, 6, 9), 'Jim', 'Tom'],
['Job 1', datetime.date(2019, 6, 9), 'Bill', 'Tom'],
['Job 1', datetime.date(2019, 6, 9), 'Tom', 'Tom'],
['Job 1', datetime.date(2019, 6, 10), 'Bill', None],
['Job 2', datetime.date(2019,6,10), 'Tom', 'Tom']
]
df = pd.DataFrame(data, columns=['Job', 'Date', 'Employee', 'Manager'])
This yields a dataframe that looks like: 这会产生一个如下所示的数据框:
Job Date Employee Manager
0 Job 1 2019-06-09 Jim Tom
1 Job 1 2019-06-09 Bill Tom
2 Job 1 2019-06-09 Tom Tom
3 Job 1 2019-06-10 Bill None
4 Job 2 2019-06-10 Tom Tom
What I am trying to generate is a pivot on each unique Job/Date combo, with a column for Manager, and a column for a string with comma separated, non-manager employees. 我想要生成的是每个唯一的作业/日期组合的一个轴,一个是Manager列,一个是逗号分隔的非经理员工的字符串列。 A couple of things to assume:
有几件事要假设:
I'd like the resulting dataframe to look like: 我希望结果数据框看起来像:
Job Date Manager Employees
0 Job 1 2019-06-09 Tom Jim, Bill
1 Job 1 2019-06-10 None Bill
2 Job 2 2019-06-10 Tom None
Which leads to my questions: 这引出了我的问题:
I suspect 1) is possible, and 2) might be more difficult. 我怀疑1)是可能的,2)可能更难。 If 2) is a no, I can get around it in other ways later in my code.
如果2)是no,我可以稍后在我的代码中以其他方式绕过它。
The tricky part here is removing the Manager from the Employee column. 这里棘手的部分是从Employee列中删除Manager。
u = df.melt(['Job', 'Date'])
f = u[~u.duplicated(['Job', 'Date', 'value'], keep='last')].astype(str)
f.pivot_table(
index=['Job', 'Date'],
columns='variable', values='value',
aggfunc=','.join
).rename_axis(None, axis=1)
Employee Manager
Job Date
Job 1 2019-06-09 Jim,Bill Tom
2019-06-10 Bill None
Job 2 2019-06-10 NaN Tom
Group to aggregate, then fix the Employees by removing the Manager and setting to None where appropriate. 要聚合的组,然后通过删除管理器并在适当的位置设置为“无”来修复“员工”。 Since the employees are unique, sets will work nicely here to remove the Manager.
由于员工是独一无二的,因此集合可以很好地删除管理器。
s = df.groupby(['Job', 'Date']).agg({'Manager': 'first', 'Employee': lambda x: set(x)})
s['Employee'] = [', '.join(x.difference({y})) for x,y in zip(s.Employee, s.Manager)]
s['Employee'] = s.Employee.replace({'': None})
Manager Employee
Job Date
Job 1 2019-06-09 Tom Jim, Bill
2019-06-10 None Bill
Job 2 2019-06-10 Tom None
I'm partial to building a dictionary up with the desired results and reconstructing the dataframe. 我倾向于用期望的结果构建一个字典并重建数据帧。
d = {}
for t in df.itertuples():
d_ = d.setdefault((t.Job, t.Date), {})
d_['Manager'] = t.Manager
d_.setdefault('Employees', set()).add(t.Employee)
for k, v in d.items():
v['Employees'] -= {v['Manager']}
v['Employees'] = ', '.join(v['Employees'])
pd.DataFrame(d.values(), d).rename_axis(['Job', 'Date']).reset_index()
Job Date Employees Manager
0 Job 1 2019-06-09 Bill, Jim Tom
1 Job 1 2019-06-10 Bill None
2 Job 2 2019-06-10 Tom
In your case try not using lambda transform
+ drop_duplicates
在你的情况下,尝试不使用lambda
transform
+ drop_duplicates
df['Employee']=df['Employee'].mask(df['Employee'].eq(df.Manager)).dropna().groupby([df['Job'], df['Date']]).transform('unique').str.join(',')
df=df.drop_duplicates(['Job','Date'])
df
Out[745]:
Job Date Employee Manager
0 Job 1 2019-06-09 Jim,Bill Tom
3 Job 1 2019-06-10 Bill None
4 Job 2 2019-06-10 NaN Tom
how about 怎么样
df.groupby(["Job","Date","Manager"]).apply( lambda x: ",".join(x.Employee))
this will find all unique sets of Job Date and Manager and put the employees together with "," into one string 这将找到所有独特的工作日期和经理,并将员工与“,”放在一个字符串中
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.