![](/img/trans.png)
[英]Pandas- Dividing a column by another column conditional on if values are greater than 0?
[英]Pandas- pivoting column into (conditional) aggregated string
假設我有以下數據集,變成了數據幀:
data = [
['Job 1', datetime.date(2019, 6, 9), 'Jim', 'Tom'],
['Job 1', datetime.date(2019, 6, 9), 'Bill', 'Tom'],
['Job 1', datetime.date(2019, 6, 9), 'Tom', 'Tom'],
['Job 1', datetime.date(2019, 6, 10), 'Bill', None],
['Job 2', datetime.date(2019,6,10), 'Tom', 'Tom']
]
df = pd.DataFrame(data, columns=['Job', 'Date', 'Employee', 'Manager'])
這會產生一個如下所示的數據框:
Job Date Employee Manager
0 Job 1 2019-06-09 Jim Tom
1 Job 1 2019-06-09 Bill Tom
2 Job 1 2019-06-09 Tom Tom
3 Job 1 2019-06-10 Bill None
4 Job 2 2019-06-10 Tom Tom
我想要生成的是每個唯一的作業/日期組合的一個軸,一個是Manager列,一個是逗號分隔的非經理員工的字符串列。 有幾件事要假設:
我希望結果數據框看起來像:
Job Date Manager Employees
0 Job 1 2019-06-09 Tom Jim, Bill
1 Job 1 2019-06-10 None Bill
2 Job 2 2019-06-10 Tom None
這引出了我的問題:
我懷疑1)是可能的,2)可能更難。 如果2)是no,我可以稍后在我的代碼中以其他方式繞過它。
這里棘手的部分是從Employee列中刪除Manager。
u = df.melt(['Job', 'Date'])
f = u[~u.duplicated(['Job', 'Date', 'value'], keep='last')].astype(str)
f.pivot_table(
index=['Job', 'Date'],
columns='variable', values='value',
aggfunc=','.join
).rename_axis(None, axis=1)
Employee Manager
Job Date
Job 1 2019-06-09 Jim,Bill Tom
2019-06-10 Bill None
Job 2 2019-06-10 NaN Tom
要聚合的組,然后通過刪除管理器並在適當的位置設置為“無”來修復“員工”。 由於員工是獨一無二的,因此集合可以很好地刪除管理器。
s = df.groupby(['Job', 'Date']).agg({'Manager': 'first', 'Employee': lambda x: set(x)})
s['Employee'] = [', '.join(x.difference({y})) for x,y in zip(s.Employee, s.Manager)]
s['Employee'] = s.Employee.replace({'': None})
Manager Employee
Job Date
Job 1 2019-06-09 Tom Jim, Bill
2019-06-10 None Bill
Job 2 2019-06-10 Tom None
我傾向於用期望的結果構建一個字典並重建數據幀。
d = {}
for t in df.itertuples():
d_ = d.setdefault((t.Job, t.Date), {})
d_['Manager'] = t.Manager
d_.setdefault('Employees', set()).add(t.Employee)
for k, v in d.items():
v['Employees'] -= {v['Manager']}
v['Employees'] = ', '.join(v['Employees'])
pd.DataFrame(d.values(), d).rename_axis(['Job', 'Date']).reset_index()
Job Date Employees Manager
0 Job 1 2019-06-09 Bill, Jim Tom
1 Job 1 2019-06-10 Bill None
2 Job 2 2019-06-10 Tom
在你的情況下,嘗試不使用lambda transform
+ drop_duplicates
df['Employee']=df['Employee'].mask(df['Employee'].eq(df.Manager)).dropna().groupby([df['Job'], df['Date']]).transform('unique').str.join(',')
df=df.drop_duplicates(['Job','Date'])
df
Out[745]:
Job Date Employee Manager
0 Job 1 2019-06-09 Jim,Bill Tom
3 Job 1 2019-06-10 Bill None
4 Job 2 2019-06-10 NaN Tom
怎么樣
df.groupby(["Job","Date","Manager"]).apply( lambda x: ",".join(x.Employee))
這將找到所有獨特的工作日期和經理,並將員工與“,”放在一個字符串中
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.