[英]How to sum values in a column using conditional statements of other columns in a pandas dataframe?
I have a dataframe having 5 columns and 25552 rows. 我有一个具有5列和25552行的数据框。 The dataframe structure as follows:
数据框结构如下:
mydf.head(4)
station date Lat Lon prcp
USC00397992 1998-10-01 44.26 -99.44 0.5
USC00397993 1998-10-01 44.01 -100.35 1.2
USC00397994 1998-10-01 45.65 -97.12 1.1
USC00397995 1998-10-01 43.90 -99.52 0.7
There are many distinct stations in station
column and the date
column has dates range from 1998-10-01 to 1999-06-30. station
列中有许多不同的站点, date
列的日期范围为1998-10-01至1999-06-30。 Also, each distinct station has distinct lat and Lon. 同样,每个不同的站点都有不同的纬度和经度。 The
prcp
column is a record of precipitations for respective dates. prcp
列记录各个日期的降水量。 Now I want to find the sum of prcp
values for each station
date range from 1999-05-01 to 1999-05-07. 现在,我想查找每个
station
日期范围从1999-05-01到1999-05-07的prcp
值的总和。 I want output like this: 我想要这样的输出:
station Lat Lon sum_from_May1_to_May7
USC00397992 44.26 -99.44 2.5 (for instance)
. . . .
. . . .
.
First filter your data frame 首先过滤您的数据框
df2 = df.loc[(df.date >= '1999-05-01') & (df.date <= '1999-05-07)]
Then just straightforwardly 然后直接
df2.groupby('station').prcp.sum()
If you don't want different Lat
and Lon
grouped together, then 如果您不希望将不同的
Lat
和Lon
分组在一起,那么
df2.groupby(['station', 'Lat', 'Lon']).prcp.sum()
If you dont want to groupby with respect to lat long: 如果您不想针对经纬度进行分组:
df[(df['date']>pd.Timestamp(1995,5,1)) & (df['date']<pd.Timestamp(1995,5,7))]\
.groupby('station').agg({'prcp':'sum', 'Lat' :'first', 'Lon' :'first'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.