I am trying to get the max value of one row, according to the cumulative sum of a different row. My dataframe looks like this:
df = pd.DataFrame({'constant': ['a', 'b', 'b', 'c', 'c', 'd', 'a'], 'value': [1, 3, 1, 5, 1, 9, 2]})
indx constant value
0 a 1
1 b 3
2 b 1
3 c 5
4 c 1
5 d 9
6 a 2
I am trying to add a new field, with the constant
that has the highest cumulative sum of value
up to that point in the dataframe. the final dataframe would look like this:
indx constant value new_field
0 a 1 NaN
1 b 3 a
2 b 1 b
3 c 5 b
4 c 1 c
5 d 9 c
6 a 2 d
As you can see, at index 1, a
has the highest cumulative sum of value
for all prior rows. At index 2, b
has the highest cumulative sum of value
for all prior rows, and so on.
Anyone have a solution?
As presented, you just need a shift. However try the following for other scenarios.
Steps Find the cummulative maximum
Where the cummulative max is equal to df['value'], copy the 'constant', otherwise make it a NaN
The NaNs should leave chance to broadcast the constant corresponding to the max value
Outcome
df=df.assign(new_field=(np.where(df['value']==df['value'].cummax(), df['constant'], np.nan))).ffill()
df=df.assign(new_field=df['new_field'].shift())
constant value new_field
0 a 1 NaN
1 b 3 a
2 b 1 b
3 c 5 b
4 c 1 c
5 d 9 c
6 a 2 d
You should be a little more careful (since values can be negative value which decrease cumsum), here is what you probably need to do,
df["cumsum"] = df["value"].cumsum()
df["cummax"] = df["cumsum"].cummax()
df["new"] = np.where(df["cumsum"] == df["cummax"], df['constant'], np.nan)
df["new"] = df.ffill()["new"].shift()
df
I think you should try and approach this as a pivot table, which would allow you to use np.argmax
over the column axis.
# this will count cummulative occurences over the ix for each value of `constant`
X = df.pivot_table(
index=df.index,
columns=['constant'],
values='value'
).fillna(0.0).cumsum(axis=0)
# now you get a list of ixs that max the cummulative value over the column axis - i.e., the "winner"
colix = np.argmax(X.values, axis=1)
# you can fetch corresponding column names using this argmax index
df['winner'] = np.r_[[np.nan], X.columns[colix].values[:-1]]
# and there you go
df
constant value winner
0 a 1 NaN
1 b 3 a
2 b 1 b
3 c 5 b
4 c 1 c
5 d 9 c
6 a 2 d
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.