简体   繁体   中英

Filter, iterate, , cumsum, add to dataframe

I have the following dataframe:

a,b,c,d
x,3,4,8
x,4,4,7
x,8,8,8
y,6,6,2
y,5,1,3
y,6,2,1
y,6,8,6
z,4,6,3
z,2,8,6
z,9,9,3
z,2,8,6
z,9,9,3

I'm looking to:

  1. Filter for each value via (loop) in column a (x,y,z).
  2. Then get the cumulative sum of the values in col b, for those filtered values (x being first)

So the cumulative sum of x, b would be:

3
7
15
  1. Add that cumulative sum to another df, where I'll do additional computation
  2. Have the process repeat for the next value in col a, which is y. Which would be y, b:
6
11
17
23
  1. Once all of x, y, and z have been processed, repeat for column c and d.

Currently I can do the proper loops, get the cumsum, and add to the other df using the code below. The issue is that if in line 11 I try and have a variable (J) in the place of b which I need in order to iterate over columns b,c,d I get an error.

Input["WinStartTime"].unique()  
Starts = Input["WinStartTime"].unique().tolist() 
Cols = [b, b]

for I in Starts:
    InputLVPosU10 = (Input['WinStartTime'] == I) 

    for J in Cols:
        Input["Tot"] = Input[InputLVPosU10].b.cumsum()
        
        CS = pd.DataFrame(Input[InputLVPosU10].Tot) 
        print (CS)
        CS = CS.reset_index(drop=True)

error:


AttributeError Traceback (most recent call last) in 10 # print(Input[InputLVPosU10].FinalinTicks.cumsum()) 11 ---> 12 Input["Tot"] = Input[InputLVPosU10].J.cumsum() #Inject Tot column w running total of FinalinTicks-Filtered and then totaled... 13 14 CS = pd.DataFrame(Input[InputLVPosU10].Tot) #CS,New dataframe - Running total of everything in 'InputLVPosU10' Based on the Time

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in getattr (self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object. getattribute (self, name) 5068 5069 def setattr (self, name, value):

AttributeError: 'DataFrame' object has no attribute 'J'


I'm not sure if I have to use some other method, but any help would be great as I seem to have run into a dead end here...

Avoid loops whenever you can in pandas and use the built-in functions:

# The cumsum() will remove column a from the result set.
# So we need to assign it back
result = df.groupby("a").cumsum().assign(a=df["a"])

# Get the cumsum for x only
result[result["a"] == "x"]

I think the previous answer is more of what you are looking for, but this is a simple SQL-esque way of getting there

from pandasql import sqldf
import pandas as pd
data = {'a':['x','x','x','y','y','y','y','z','z','z','z','z'],'b':[3,4,8,6,5,6,6,4,2,9,2,9],'c':[4,4,8,6,1,2,8,6,8,9,8,9],'d':[8,7,8,2,3,1,6,3,6,3,6,3]}
df = pd.DataFrame(data)
q = "select distinct a, sum(b) as b_sum, sum(c) as c_sum, sum(d) as d_sum from df group by a"
df_2 = sqldf(q,globals())
print(df_2)
   a  b_sum  c_sum  d_sum
0  x     15     16     23
1  y     23     17     12
2  z     26     40     21

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM