简体   繁体   English

熊猫根据下一行分配值

[英]Pandas assign value based on next row(s)

Consider this simple pandas DataFrame with columns 'record', 'start', and 'param'.考虑这个带有“记录”、“开始”和“参数”列的简单 Pandas DataFrame。 There can be multiple rows with the same record value, and each unique record value corresponds to the same start value.可以有多行具有相同的记录值,每个唯一的记录值对应相同的起始值。 However, the 'param' value can be different for the same 'record' and 'start' combination:但是,对于相同的“记录”和“开始”组合,“参数”值可能不同:

pd.DataFrame({'record':[1,2,3,4,4,5,6,7,7,7,8], 'start':[0,5,7,13,13,19,27,38,38,38,54], 'param':['t','t','t','u','v','t','t','t','u','v','t']})

I'd like to make a column 'end' that takes the value of 'start' in the row with the next unique value of 'record'.我想创建一个列“结束”,该列在下一个唯一值“记录”的行中取“开始”的值。 The values of column 'end' should be: 'end' 列的值应该是:

[5,7,13,19,19,27,38,54,54,54,NaN]

I'm able to do this using a for loop, but I know this is not preferred when using pandas:我可以使用 for 循环来做到这一点,但我知道这在使用 Pandas 时不是首选:

max_end = 100
for idx, row in df.iterrows():
    try:
        n = 1
        next_row = df.iloc[idx+n]
        while next_row['start'] == row['start']:
            n = n+1
            next_row = df.iloc[idx+n]
        end = next_row['start']
    except:
        end = max_end
    df.at[idx, 'end'] = end

Is there an easy way to achieve this without a for loop?有没有一种简单的方法可以在没有 for 循环的情况下实现这一目标?

I have no doubt there is a smarter solution but here is mine.我毫不怀疑有一个更聪明的解决方案,但这是我的。

df1['end'] = df1.drop_duplicates(subset = ['record', 'start'])['start'].shift(-1).reindex(index = df1.index, method = 'ffill')

-=EDIT=- Added subset into drop_duplicates to account for question amendment -=EDIT=- 将子集添加到drop_duplicates以说明问题修正

This solution is equivalent to @Quixotic22 although more explicit.尽管更明确,但此解决方案等效于@ Quixotic22。

df = pd.DataFrame({
'record':[1,2,3,4,4,5,6,7,7,7,8],
'start':[0,5,7,13,13,19,27,38,38,38,54],
'param':['t','t','t','u','v','t','t','t','u','v','t']
})
max_end = 100

df["end"] = None  # create new column with empty values
loc = df["record"].shift(1) != df["record"] # record where the next value is diff from previous

df.loc[loc, "end"] = df.loc[loc, "start"].shift(-1)  # assign desired values
df["end"].fillna(method = "ffill", inplace = True)  # fill remaining missing values
df.loc[df.index[-1], "end"] = max_end  # override last value

df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM