[英]Make Source and Target column based on consecutive rows
I have the following problem我有以下问题
Person 1001 accomplishes activity A and then activity C (which follows activity A) I need to move consecutive rows to target columns人员 1001 完成活动 A,然后完成活动 C(在活动 A 之后)我需要将连续的行移动到目标列
df = pd.DataFrame([[1001, 'A'], [1001,'C'], [1004, 'D'],[1005, 'C'],
[1005,'D'], [1010, 'A'],[1010,'D'],[1010,'F']], columns=['CustomerNr','Activity'])
df = pd.DataFrame([[1001, 'A','C'], [1004, 'D',np.nan],[1005, 'C','D'],
[1010, 'A','D'],[1010,'D' ,'F']], columns=['CustomerNr','Target','Source'])
CustomerNr客户编号 | Source来源 | Target目标 |
---|---|---|
1001 1001 | A一种 | C C |
1004 1004 | D丁 | NaN钠盐 |
1005 1005 | C C | D丁 |
1010 1010 | A一种 | D丁 |
1010 1010 | D丁 | F F |
you can use:您可以使用:
df['Target']=df['Activity'].shift(-1)
df['prev_CustomerNr']=df['CustomerNr'].shift(-1)
print(df)
'''
CustomerNr Activity Target prev_CustomerNr
0 1001 A C 1001.0
1 1001 C D 1004.0
2 1004 D C 1005.0
3 1005 C D 1005.0
4 1005 D A 1010.0
5 1010 A D 1010.0
6 1010 D F 1010.0
7 1010 F None NaN
'''
#we can't find the target information of the most recent activity. So we drop the last row for each CustomerNr.
m1 = df.duplicated(['CustomerNr'], keep="last") #https://stackoverflow.com/a/70216388/15415267
m2 = ~df.duplicated(['CustomerNr'], keep=False)
df = df[m1|m2]
#If CustomerNr and prev_CustomerNr are not the same, I replace with nan.
df['Target']=np.where(df['CustomerNr']==df['prev_CustomerNr'],df['Target'],np.nan)
df=df.drop(['prev_CustomerNr'],axis=1)
print(df)
'''
CustomerNr Activity Target
0 1001 A C
2 1004 D NaN
3 1005 C D
5 1010 A D
6 1010 D F
'''
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.