I have the following dataframe:
a,b,c,d
x,3,4,8
x,4,4,7
x,8,8,8
y,6,6,2
y,5,1,3
y,6,2,1
y,6,8,6
z,4,6,3
z,2,8,6
z,9,9,3
z,2,8,6
z,9,9,3
I'm looking to:
So the cumulative sum of x, b would be:
3
7
15
6
11
17
23
Currently I can do the proper loops, get the cumsum, and add to the other df using the code below. The issue is that if in line 11 I try and have a variable (J) in the place of b which I need in order to iterate over columns b,c,d I get an error.
Input["WinStartTime"].unique()
Starts = Input["WinStartTime"].unique().tolist()
Cols = [b, b]
for I in Starts:
InputLVPosU10 = (Input['WinStartTime'] == I)
for J in Cols:
Input["Tot"] = Input[InputLVPosU10].b.cumsum()
CS = pd.DataFrame(Input[InputLVPosU10].Tot)
print (CS)
CS = CS.reset_index(drop=True)
error:
AttributeError Traceback (most recent call last) in 10 # print(Input[InputLVPosU10].FinalinTicks.cumsum()) 11 ---> 12 Input["Tot"] = Input[InputLVPosU10].J.cumsum() #Inject Tot column w running total of FinalinTicks-Filtered and then totaled... 13 14 CS = pd.DataFrame(Input[InputLVPosU10].Tot) #CS,New dataframe - Running total of everything in 'InputLVPosU10' Based on the Time
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in getattr (self, name) 5065 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5066 return self[name] -> 5067 return object. getattribute (self, name) 5068 5069 def setattr (self, name, value):
AttributeError: 'DataFrame' object has no attribute 'J'
I'm not sure if I have to use some other method, but any help would be great as I seem to have run into a dead end here...
Avoid loops whenever you can in pandas and use the built-in functions:
# The cumsum() will remove column a from the result set.
# So we need to assign it back
result = df.groupby("a").cumsum().assign(a=df["a"])
# Get the cumsum for x only
result[result["a"] == "x"]
I think the previous answer is more of what you are looking for, but this is a simple SQL-esque way of getting there
from pandasql import sqldf
import pandas as pd
data = {'a':['x','x','x','y','y','y','y','z','z','z','z','z'],'b':[3,4,8,6,5,6,6,4,2,9,2,9],'c':[4,4,8,6,1,2,8,6,8,9,8,9],'d':[8,7,8,2,3,1,6,3,6,3,6,3]}
df = pd.DataFrame(data)
q = "select distinct a, sum(b) as b_sum, sum(c) as c_sum, sum(d) as d_sum from df group by a"
df_2 = sqldf(q,globals())
print(df_2)
a b_sum c_sum d_sum
0 x 15 16 23
1 y 23 17 12
2 z 26 40 21
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.