简体   繁体   中英

How to do a cumulative sum in a DataFrame analyzing all possible combination columns instead of analyzing only all columns together?

If we want to know if the cumulative sums are profitable in the ['Col 1','Col 2','Col 3'] columns for the long term, we do it this way:

import pandas as pd
import io

ex_csv = """
Col 1,Col 2,Col 3,return
a,b,c,1
d,e,f,1
a,e,c,-1
a,e,c,-1
d,b,c,-1
a,b,c,1
d,e,f,1
"""

df = pd.read_csv(io.StringIO(ex_csv), sep=",")
df['invest'] = df.groupby(['Col 1','Col 2','Col 3'])['return'].cumsum().gt(df['return'])
true_backs = df[(df['invest'] == True)]['return']
print(true_backs.sum())

But what if I want it to be TRUE for the cumulative as well when not only the combination of the 3 columns is positive, but if one or two are positive too?

Example:

Perhaps the value a of Col 1 the cumulative sum of it will be positive, but together with the values of Col 2 and Col 3 they will no longer be profitable, so in my current code it would appear as FALSE.

And I want it to be TRUE.

If I understood the question correctly, this should solve your problem:

import pandas as pd
import io
import itertools

ex_csv = """
Col 1,Col 2,Col 3,return
a,b,c,1
d,e,f,1
a,e,c,-1
a,e,c,-1
d,b,c,-1
a,b,c,1
d,e,f,1
"""
df = pd.read_csv(io.StringIO(ex_csv), sep=",")

cols = df.columns.values.tolist()
cols.remove('return')
cols_comb = []
for i in range(1, len(cols) + 1):
    cols_comb += (  a for a in itertools.combinations(cols, i) )
cols_comb  = [[i for i in a] for a in cols_comb]

for i,a in enumerate(cols_comb):
    df[f'invest_{i}'] = df.groupby(a)['return'].cumsum().gt(df['return']) 

df["invest"] = df.loc[:,(f'invest_{i}' for i in range(len(cols_comb)))].apply(lambda x: any(x), axis=1)

true_backs = df[(df['invest'] == True)]['return']
print(true_backs.sum())

So, I'm creating a combination of columns in cols_comb, and it looks like this:

[['Col 1'],
 ['Col 2'],
 ['Col 3'],
 ['Col 1', 'Col 2'],
 ['Col 1', 'Col 3'],
 ['Col 2', 'Col 3'],
 ['Col 1', 'Col 2', 'Col 3']]

Combination code from: https://stackoverflow.com/a/73620523/7323333

Then, for each element in this list, I calculate the cumulative sum and assign the results to a new column in Dataframe. Then I check if any of the elements in rows of the newly generated columns is True and assign the result to the "invest" column.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM