简体   繁体   中英

Sum of columns based on range of values of other columns in a Pandas dataframe

This is my dataframe:

df = pd.DataFrame({'sym': ['a', 'b'], 'vol_1': [100, 50], 'price_1': [5, 150], 'vol_2': [1500, 2000], 'price_2': [20, 175],
                   'vol_3': [123, 500], 'price_3': [22, 1000], 'min': [18, 150], 'max': [23, 176]})

I want to add a column that sums vol_1 , vol_2 , and vol_3 for each row if the price that is in the next column for each vol is in range of min and max cols. For example for the first row I want vol_2 and vol_3 because the prices are in range of min and max . My desired outcome looks like this:

 sym  vol_1  price_1  vol_2  price_2  vol_3  price_3  min  max  vol_sum
0   a    100        5   1500       20    123       22   18   23     1623
1   b     50      150   2000      175    500     1000  150  176     2050

Reshape the data so you have individual columns for vol , price , min and max . Next, filter for only rows where price is between min and max , group by the sym column and append result to df .

df["vol_sum"] = (pd.wide_to_long(df,
                                 stubnames=["vol", "price"], 
                                 i=["sym", "min", "max"], 
                                 j="number", 
                                 sep="_")
                  .query("min <= price <= max", engine="python")
                  .groupby("sym")
                  .vol
                  .sum()
                  .array
                 )



   sym  vol_1   price_1 vol_2   price_2 vol_3   price_3 min max vol_sum
0   a   100      5      1500    20      123     22      18  23  1623
1   b   50      150     2000    175     500    1000     150 176 2050

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM