简体   繁体   中英

How do you separate a pandas dataframe by year in python?

I am trying to make a graph that shows the average temperature each day over a year by averaging 19 years of NOAA data (side note, is there any better way to get historical weather data because the NOAA's seems super inconsistent). I was wondering what the best way to set up the data would be. The relevant columns of my data look like this:

              DATE  PRCP    TAVG    TMAX    TMIN    TOBS
  0     1990-01-01  17.0    NaN     13.3    8.3     10.0
  1     1990-01-02  0.0     NaN     NaN     NaN     NaN
  2     1990-01-03  0.0     NaN     13.3    2.8     10.0
  3     1990-01-04  0.0     NaN     14.4    2.8     10.0
  4     1990-01-05  0.0     NaN     14.4    2.8     11.1
...     ...     ...     ...     ...     ...     ...
10838   2019-12-27  0.0     NaN     15.0    4.4     13.3
10839   2019-12-28  0.0     NaN     14.4    5.0     13.9
10840   2019-12-29  3.6     NaN     15.0    5.6     14.4
10841   2019-12-30  0.0     NaN     14.4    6.7     12.2
10842   2019-12-31  0.0     NaN     15.0    6.7     13.9

10843 rows × 6 columns

The DATE column is the datetime64[ns] type

Here's my code:

import pandas as pd
from matplotlib import pyplot as plt

data = pd.read_csv('1990-2019.csv')

#seperate the data by station
oceanside = data[data.STATION == 'USC00047767']
downtown = data[data.STATION == 'USW00023272']
oceanside.loc[:,'DATE'] = pd.to_datetime(oceanside.loc[:,'DATE'],format='%Y-%m-%d')

#This is the area I need help with:
oceanside['DATE'].dt.year

I've been trying to separate the data by year, so I can then average it. I would like to do this without using a for loop because I plan on doing this with much larger data sets and that would be super inefficient. I looked in the pandas documentation but I couldn't find a function that seemed like it would do that. Am I missing something? Is that even the right way to do it?

I am new to pandas/python data analysis so it is very possible the answer is staring me in the face.

Any help would be greatly appreciated!

Create a dict of dataframes where each key is a year

df_by_year = dict()
for year oceanside.date.dt.year.unique():
    data = oceanside[oceanside.date.dt.year == year]
    df_by_year[year] = data

Get data by a single year

oceanside[oceanside.date.dt.year == 2019]

Get average for each year

oceanside.groupby(oceanside.date.dt.year).mean()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM