I am trying to add a new column to my dataframe that depends on values that may or may not exist in previous rows. My dataframe looks like this:
index id timestamp sequence_index value prev_seq_index
0 10 1 0 5 0
1 10 1 1 1 2
2 10 1 2 2 0
3 10 2 0 9 0
4 10 2 1 10 1
5 10 2 2 3 1
6 11 2 0 42 1
7 11 2 1 13 0
Note : there is no relation between index
and sequence_index
, index
is just a counter.
What I want to do is add a column prev_value
, that finds the value of the most recent row with the same id and sequence_index == prev_seq_index
, if no such previous row exist, use default value, for the purpose of this question I will use default value of -1
index id timestamp sequence_index value prev_seq_index prev_value
0 10 1 0 5 0 -1
1 10 1 1 1 2 -1
2 10 1 2 2 0 -1
3 10 2 0 9 0 5 # value from df[index == 0]
4 10 2 1 10 1 1 # value from df[index == 1]
5 10 2 2 3 1 1 # value from df[index == 1]
6 11 2 0 42 1 -1
7 11 2 1 13 0 -1
My current solution is a brute force which is very slow, and I was wondering if there was a faster way:
prev_values = np.zeros(len(df))
i = 0
for index, row in df.iterrows():
# filter for previous rows with the same id and desired sequence index
tmp_df = df[(df.id == row.id) & (df.timestamp < row.timestamp) \
& (df.sequence_index == row.prev_seq_index)]
if (len(tmp_df) > 0):
# get value from the most recent row
prev_value = tmp_df[tmp_df.index == tmp_df.timestamp.idxmax()].value
else:
prev_value = -1
prev_values[i] = prev_value
i += 1
df['prev_value'] = prev_values
i would suggest tackling this via a left join. However first you'll need to make sure that your data doesn't have duplicates. You'll need to create a dataframe of most recent timestamps and grab the values.
agg=pd.groupby(['sequence_index']).agg({'timestamp':'max'})
agg=pd.merge(agg,df['timestamp','sequence_index','value'], how='inner', on = ['timestamp','sequence_index'])
agg.rename(columns={'value': 'prev_value'}, inplace=True)
now you can join the data back on itself
df=pd.merge(df,agg,how='left',left_on='prev_seq_index',right_on='sequence_index')
now you can deal with the NaN values
df.prev_value=df.prev_value.fillna(-1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.