I have a dataframe df
that looks like this where no index is set:
df.head()
year month inch mm
0 1981 2 0.00 0.000
1 1981 3 4.82 122.428
2 1981 4 6.45 163.830
3 1981 5 5.03 127.762
4 1981 6 1.25 31.750
(1) First, I want to select only the years between 1987 and 2017.
(2) Then I want to groupby year for select months: MAM (so 3-5), JJAS (6-9), and OND (10-12) and sum the mm
column for these months.
The result could look something like this:
year season mm
1981 MAM 360
1981 JJAS 167
...
I'm not sure how to do part 1 but I know that for part 2 I need to convert the month
column into a date time object.
And then I would define the months of interests by:
MAM = df.iloc[df.index.month.isin(np.r_[3:6])]
JJAS = df.iloc[df.index.month.isin(np.r_[6:10])]
OND = df.iloc[df.index.month.isin(np.r_[10:13])]
But for now I get the error AttributeError: 'RangeIndex' object has no attribute 'month'
.
Thanks in advance!
The first part is pretty simple. Use pd.Series.between
:
df = df[df.year.between(1987, 2017)]
If year
is not sorted, I'd recommend sorting df
first, use sort_values(subset='year')
to do that.
For the next part, one solution would involve generating a dict
mapping and then use map
to convert month
to your mapped string, and group on that.
mapping = {3 : 'MAM', 4 : 'MAM', 5 : 'MAM', 6 : 'JJAS' ,... } # complete this
r = df.groupby(['year', df.month.map(mapping)]).sum()
Here's a slightly different approach: Use year
and month
to build an index, then groupby()
with a UDF.
Example data:
N = 10
years = pd.date_range("1981", "2017", freq="A").year
dates = np.random.choice(years, size=N, replace=True)
months = np.random.choice(range(1,13), size=N, replace=True)
inches = np.random.randint(1,20, size=N)
mm = np.random.randint(1,100, size=N)
data = {"year":dates, "month":months, "inch":inches, "mm":mm}
df = pd.DataFrame(data)
df
inch mm month year
0 19 31 12 1990
1 8 71 9 1986
2 5 85 2 2009
3 17 8 12 2005
4 10 14 12 1987
5 7 87 2 1982
6 8 59 2 2004
7 8 74 8 2016
8 5 6 6 1993
9 3 7 12 1982
Now subset based on year and build the index:
mask = df.year.between(1987, 2017)
df.index = df.apply(lambda x: pd.to_datetime("{0} {1}".format(x.year, x.month),
format="%Y %m"), axis=1)
Then groupby with year
and a month-separating function:
def month_gb(x):
if x.month in range(3,6):
return 'MAM'
elif x.month in range(6,10):
return 'JJAS'
elif x.month in range(10,13):
return 'OND'
df.loc[mask].groupby(["year", month_gb]).mm.sum()
year
1987 OND 14
1990 OND 31
1993 JJAS 6
2005 OND 8
2016 JJAS 74
Name: mm, dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.