简体   繁体   English

根据连续的行制作源和目标列

[英]Make Source and Target column based on consecutive rows

I have the following problem我有以下问题

Person 1001 accomplishes activity A and then activity C (which follows activity A) I need to move consecutive rows to target columns人员 1001 完成活动 A,然后完成活动 C(在活动 A 之后)我需要将连续的行移动到目标列

df = pd.DataFrame([[1001, 'A'], [1001,'C'], [1004, 'D'],[1005, 'C'], 
                   [1005,'D'], [1010, 'A'],[1010,'D'],[1010,'F']], columns=['CustomerNr','Activity'])
df = pd.DataFrame([[1001, 'A','C'], [1004, 'D',np.nan],[1005, 'C','D'], 
                   [1010, 'A','D'],[1010,'D' ,'F']], columns=['CustomerNr','Target','Source'])
CustomerNr客户编号 Source来源 Target目标
1001 1001 A一种 C C
1004 1004 D NaN钠盐
1005 1005 C C D
1010 1010 A一种 D
1010 1010 D F F

you can use:您可以使用:

df['Target']=df['Activity'].shift(-1)
df['prev_CustomerNr']=df['CustomerNr'].shift(-1)
print(df)
'''
   CustomerNr Activity Target  prev_CustomerNr
0        1001        A      C           1001.0
1        1001        C      D           1004.0
2        1004        D      C           1005.0
3        1005        C      D           1005.0
4        1005        D      A           1010.0
5        1010        A      D           1010.0
6        1010        D      F           1010.0
7        1010        F   None              NaN
'''
#we can't find the target information of the most recent activity. So we drop the last row for each CustomerNr.

m1 = df.duplicated(['CustomerNr'], keep="last") #https://stackoverflow.com/a/70216388/15415267
m2 = ~df.duplicated(['CustomerNr'], keep=False)
df = df[m1|m2]

#If CustomerNr and prev_CustomerNr are not the same, I replace with nan.
df['Target']=np.where(df['CustomerNr']==df['prev_CustomerNr'],df['Target'],np.nan)
df=df.drop(['prev_CustomerNr'],axis=1)

print(df)
'''
   CustomerNr Activity Target
0        1001        A      C
2        1004        D    NaN
3        1005        C      D
5        1010        A      D
6        1010        D      F
'''

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 数据框 - 基于列删除连续行 - Python dataframe - drop consecutive rows based on a column 如何根据另一列中的连续两行添加 dataframe 列 - How to add a dataframe column based on two consecutive rows in another column 根据一列的连续值获取数据框的行 - Get the rows of dataframe based on the consecutive values of one column 通过遍历熊猫中的连续行来创建新列 - Create a new column based by Iterating over consecutive rows in pandas 使用 Groupby 根据 Pandas 中列中的值从 DataFrame 中选择 CONSECUTIVE 行 - Select CONSECUTIVE rows from a DataFrame based on values in a column in Pandas with Groupby Pandas:根据源列中的值将值写入目标列,而不覆盖目标列中的任何现有值 - Pandas: writing values to a target column based on values in a source column without overwriting any existing values in the target column Groupby 列和聚合连续行 - Groupby column and aggregate consecutive rows 根据另一个列值在列中保留具有特定值的连续行中的最大行 - Keep maximum row in consecutive rows with specific value in column based on another column value 根据连续行是否相似删除行 - Python - delete rows based on if consecutive rows are similar - Python 比较两个连续的行并根据特定的逻辑操作创建一个新列 - Comparing two consecutive rows and creating a new column based on a specific logical operation
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM