繁体   English   中英

根据连续的行制作源和目标列

[英]Make Source and Target column based on consecutive rows

我有以下问题

人员 1001 完成活动 A,然后完成活动 C(在活动 A 之后)我需要将连续的行移动到目标列

df = pd.DataFrame([[1001, 'A'], [1001,'C'], [1004, 'D'],[1005, 'C'], 
                   [1005,'D'], [1010, 'A'],[1010,'D'],[1010,'F']], columns=['CustomerNr','Activity'])
df = pd.DataFrame([[1001, 'A','C'], [1004, 'D',np.nan],[1005, 'C','D'], 
                   [1010, 'A','D'],[1010,'D' ,'F']], columns=['CustomerNr','Target','Source'])
客户编号 来源 目标
1001 一种 C
1004 钠盐
1005 C
1010 一种
1010 F

您可以使用:

df['Target']=df['Activity'].shift(-1)
df['prev_CustomerNr']=df['CustomerNr'].shift(-1)
print(df)
'''
   CustomerNr Activity Target  prev_CustomerNr
0        1001        A      C           1001.0
1        1001        C      D           1004.0
2        1004        D      C           1005.0
3        1005        C      D           1005.0
4        1005        D      A           1010.0
5        1010        A      D           1010.0
6        1010        D      F           1010.0
7        1010        F   None              NaN
'''
#we can't find the target information of the most recent activity. So we drop the last row for each CustomerNr.

m1 = df.duplicated(['CustomerNr'], keep="last") #https://stackoverflow.com/a/70216388/15415267
m2 = ~df.duplicated(['CustomerNr'], keep=False)
df = df[m1|m2]

#If CustomerNr and prev_CustomerNr are not the same, I replace with nan.
df['Target']=np.where(df['CustomerNr']==df['prev_CustomerNr'],df['Target'],np.nan)
df=df.drop(['prev_CustomerNr'],axis=1)

print(df)
'''
   CustomerNr Activity Target
0        1001        A      C
2        1004        D    NaN
3        1005        C      D
5        1010        A      D
6        1010        D      F
'''

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM