Filter, iterate, , cumsum, add to dataframe

Question

I have the following dataframe:

a,b,c,d
x,3,4,8
x,4,4,7
x,8,8,8
y,6,6,2
y,5,1,3
y,6,2,1
y,6,8,6
z,4,6,3
z,2,8,6
z,9,9,3
z,2,8,6
z,9,9,3

I'm looking to:

Filter for each value via (loop) in column a (x,y,z).
Then get the cumulative sum of the values in col b, for those filtered values (x being first)

So the cumulative sum of x, b would be:

3
7
15

Add that cumulative sum to another df, where I'll do additional computation
Have the process repeat for the next value in col a, which is y. Which would be y, b:

Once all of x, y, and z have been processed, repeat for column c and d.

Currently I can do the proper loops, get the cumsum, and add to the other df using the code below. The issue is that if in line 11 I try and have a variable (J) in the place of b which I need in order to iterate over columns b,c,d I get an error.

Input["WinStartTime"].unique()  
Starts = Input["WinStartTime"].unique().tolist() 
Cols = [b, b]

for I in Starts:
    InputLVPosU10 = (Input['WinStartTime'] == I) 

    for J in Cols:
        Input["Tot"] = Input[InputLVPosU10].b.cumsum()
        
        CS = pd.DataFrame(Input[InputLVPosU10].Tot) 
        print (CS)
        CS = CS.reset_index(drop=True)

error:

AttributeError Traceback (most recent call last) in 10 # print(Input[InputLVPosU10].FinalinTicks.cumsum()) 11 ---> 12 Input["Tot"] = Input[InputLVPosU10].J.cumsum() #Inject Tot column w running total of FinalinTicks-Filtered and then totaled... 13 14 CS = pd.DataFrame(Input[InputLVPosU10].Tot) #CS,New dataframe - Running total of everything in 'InputLVPosU10' Based on the Time

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in getattr (self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object. getattribute (self, name) 5068 5069 def setattr (self, name, value):

AttributeError: 'DataFrame' object has no attribute 'J'

I'm not sure if I have to use some other method, but any help would be great as I seem to have run into a dead end here...

Answer 1

Avoid loops whenever you can in pandas and use the built-in functions:

# The cumsum() will remove column a from the result set.
# So we need to assign it back
result = df.groupby("a").cumsum().assign(a=df["a"])

# Get the cumsum for x only
result[result["a"] == "x"]

Answer 2

I think the previous answer is more of what you are looking for, but this is a simple SQL-esque way of getting there

from pandasql import sqldf
import pandas as pd
data = {'a':['x','x','x','y','y','y','y','z','z','z','z','z'],'b':[3,4,8,6,5,6,6,4,2,9,2,9],'c':[4,4,8,6,1,2,8,6,8,9,8,9],'d':[8,7,8,2,3,1,6,3,6,3,6,3]}
df = pd.DataFrame(data)
q = "select distinct a, sum(b) as b_sum, sum(c) as c_sum, sum(d) as d_sum from df group by a"
df_2 = sqldf(q,globals())
print(df_2)
   a  b_sum  c_sum  d_sum
0  x     15     16     23
1  y     23     17     12
2  z     26     40     21

Filter, iterate, , cumsum, add to dataframe

Question

2 answers

solution1
0 2022-10-08 14:27:34

solution2
0 2022-12-13 17:09:26

Filter, iterate, , cumsum, add to dataframe

Question

2 answers

solution1 0 2022-10-08 14:27:34

solution2 0 2022-12-13 17:09:26

solution1
0 2022-10-08 14:27:34

solution2
0 2022-12-13 17:09:26