简体   繁体   中英

Pandas, Python: rotate some (31-day) columns of a dataframe and match them to the existing (year, month) rows (NOAA data)

I have NOAA weather data. In it raw state it has year and month as rows and then days as columns. I want to expand the number of rows so that each row has a year, month, and day with the appropriate data in each row.

There is also a weather variables column where each row represents a different weather variable collected each month. The number of weather variables collected in a month can change. (In January there are two (tmax, tmin), in February there are three (tmax, tmin, prcp), and in March there is one (tmin).)

Here is an example df.

example_df = pd.DataFrame({'station': ['USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1'],
           'year': [1993, 1993, 1993, 1993,1993, 1993],
           'month': [1, 1,  2, 2, 2, 3],
           'attribute':['tmax', 'tmin', 'tmax', 'tmin', 'prcp', 'tmax'],
           'day1': range(1, 7, 1),
           'day2': range(1, 7, 1),
           'day3': range(1, 7, 1),
           'day4': range(1, 7, 1),
                  })
example_df = example_df[['station', 'year', 'month', 'attribute', 'day1', 'day2', 'day3', 'day4']]

This is the solution I want,

solution_df = pd.DataFrame({'station': ['USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1','USC1', 'USC1', 'USC1', 'USC1', 'USC1', 'USC1'],
           'year': [1993, 1993, 1993, 1993,1993, 1993, 1993, 1993, 1993, 1993,1993, 1993],
           'month': [1, 1,1, 1, 2, 2,  2, 2, 3, 3, 3, 3],
           'day':[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
           'tmax': [1, 1, 1, 1, 3, 3, 3, 3, 6, 6, 6, 6],
           'tmin': [2, 2, 2, 2, 4, 4, 4, 4, np.nan, np.nan, np.nan, np.nan],
           'prcp': [np.nan, np.nan, np.nan, np.nan, 5, 5, 5, 5, np.nan, np.nan, np.nan, np.nan]

                  })
solution_df = solution_df[['station', 'year', 'month', 'day', 'tmax', 'tmin', 'prcp']]

I have tried .T, pivot, melt, stack, and unstack to get the day columns to be rows with the correct months.

This is as close as I have gotten to success with the example dataset.

record_arr = example_df.to_records()

new_df = pd.DataFrame({'station': np.nan,
                  'year': np.nan,
                  'month':np.nan, 
                  'day': np.nan,
                  'tmax':np.nan,
                  'tmin': np.nan,
                  'prcp':np.nan},
                   index = [1]
                 )
new_df.append ({'station': record_arr[0][1], 'year': record_arr[0][2], 'month':record_arr[0][3], 'tmax':record_arr[0][5], 'tmin':record_arr[1][5] }, ignore_index = True)

This requires pivot as well as melt (or unstack and stack). This is how I got it in two steps

df1 = example_df.set_index(['station', 'year', 'month', 'attribute']).stack().reset_index()
df1.set_index(['station', 'year', 'month', 'level_4','attribute'])[0].unstack().reset_index()


attribute   station year    month   level_4 prcp    tmax    tmin
0           USC1    1993    1       day1    NaN     1.0     2.0
1           USC1    1993    1       day2    NaN     1.0     2.0
2           USC1    1993    1       day3    NaN     1.0     2.0
3           USC1    1993    1       day4    NaN     1.0     2.0
4           USC1    1993    2       day1    5.0     3.0     4.0
5           USC1    1993    2       day2    5.0     3.0     4.0
6           USC1    1993    2       day3    5.0     3.0     4.0
7           USC1    1993    2       day4    5.0     3.0     4.0
8           USC1    1993    3       day1    NaN     6.0     NaN
9           USC1    1993    3       day2    NaN     6.0     NaN
10          USC1    1993    3       day3    NaN     6.0     NaN
11          USC1    1993    3       day4    NaN     6.0     NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM