简体   繁体   中英

Pandas Date Time Data Frame

relatively new to Python. I don't think this is a duplicate because I didn't find the answer I was looking for.

I have the following dataframe consisting 'Date' in datetime64 format and average temperature in Celsius as float64. I have 18 years (1990 to 2018) worth of daily recordings and I am supposed to gather the highest temperature for each of the 18 years.

Date    Average Daily Value
0   1990-01-01  8.88330
1   1990-01-02  9.11045
2   1990-01-03  10.93545
3   1990-01-04  3.69165
4   1990-01-05  6.03955
... ... ...
10567   2018-12-27  6.20830
10568   2018-12-28  7.05420
10569   2018-12-29  2.68330
10570   2018-12-30  14.49580
10571   2018-12-31  4.74170

year = set(df['Date'].dt.year.to_list()); years = list(years)

years = [1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 
1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018]

I have managed to make a list of the years and I am hoping to use it to iterate through the data but I am not sure how. I tried using a for loop but it just return the highest value for whole data set, not for each year.

Any help would be great. Thanks.

You need to first group by year and then fetch the maximum:

Example:

import numpy as np
import pandas as pd

df = pd.read_csv('test.csv', converters={'date': pd.to_datetime})

df['years'] = df['date'].dt.year

grouped_df = df.groupby('years')

max_temp = grouped_df.max('temp')
max_temp

Output with my test set:

       temp
years   
2018     14
2019     12
2020     11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM