[英]Make Source and Target column based on consecutive rows
我有以下問題
人員 1001 完成活動 A,然后完成活動 C(在活動 A 之后)我需要將連續的行移動到目標列
df = pd.DataFrame([[1001, 'A'], [1001,'C'], [1004, 'D'],[1005, 'C'],
[1005,'D'], [1010, 'A'],[1010,'D'],[1010,'F']], columns=['CustomerNr','Activity'])
df = pd.DataFrame([[1001, 'A','C'], [1004, 'D',np.nan],[1005, 'C','D'],
[1010, 'A','D'],[1010,'D' ,'F']], columns=['CustomerNr','Target','Source'])
客戶編號 | 來源 | 目標 |
---|---|---|
1001 | 一種 | C |
1004 | 丁 | 鈉鹽 |
1005 | C | 丁 |
1010 | 一種 | 丁 |
1010 | 丁 | F |
您可以使用:
df['Target']=df['Activity'].shift(-1)
df['prev_CustomerNr']=df['CustomerNr'].shift(-1)
print(df)
'''
CustomerNr Activity Target prev_CustomerNr
0 1001 A C 1001.0
1 1001 C D 1004.0
2 1004 D C 1005.0
3 1005 C D 1005.0
4 1005 D A 1010.0
5 1010 A D 1010.0
6 1010 D F 1010.0
7 1010 F None NaN
'''
#we can't find the target information of the most recent activity. So we drop the last row for each CustomerNr.
m1 = df.duplicated(['CustomerNr'], keep="last") #https://stackoverflow.com/a/70216388/15415267
m2 = ~df.duplicated(['CustomerNr'], keep=False)
df = df[m1|m2]
#If CustomerNr and prev_CustomerNr are not the same, I replace with nan.
df['Target']=np.where(df['CustomerNr']==df['prev_CustomerNr'],df['Target'],np.nan)
df=df.drop(['prev_CustomerNr'],axis=1)
print(df)
'''
CustomerNr Activity Target
0 1001 A C
2 1004 D NaN
3 1005 C D
5 1010 A D
6 1010 D F
'''
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.