Pandas Date Time Data Frame

Question

relatively new to Python. I don't think this is a duplicate because I didn't find the answer I was looking for.

I have the following dataframe consisting 'Date' in datetime64 format and average temperature in Celsius as float64. I have 18 years (1990 to 2018) worth of daily recordings and I am supposed to gather the highest temperature for each of the 18 years.

Date    Average Daily Value
0   1990-01-01  8.88330
1   1990-01-02  9.11045
2   1990-01-03  10.93545
3   1990-01-04  3.69165
4   1990-01-05  6.03955
... ... ...
10567   2018-12-27  6.20830
10568   2018-12-28  7.05420
10569   2018-12-29  2.68330
10570   2018-12-30  14.49580
10571   2018-12-31  4.74170

year = set(df['Date'].dt.year.to_list()); years = list(years)

years = [1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 
1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018]

I have managed to make a list of the years and I am hoping to use it to iterate through the data but I am not sure how. I tried using a for loop but it just return the highest value for whole data set, not for each year.

Any help would be great. Thanks.

Answer 1

You need to first group by year and then fetch the maximum:

Example:

import numpy as np
import pandas as pd

df = pd.read_csv('test.csv', converters={'date': pd.to_datetime})

df['years'] = df['date'].dt.year

grouped_df = df.groupby('years')

max_temp = grouped_df.max('temp')
max_temp

Output with my test set:

       temp
years   
2018     14
2019     12
2020     11

Pandas Date Time Data Frame

Question

1 answers

solution1
1 2021-10-09 01:17:49

Pandas Date Time Data Frame

Question

1 answers

solution1 1 2021-10-09 01:17:49

solution1
1 2021-10-09 01:17:49