简体   繁体   中英

Pandas: reshape Dataframe to condense multiple columns into single row value

I have the following data-frame

      pointId    april august december  february    \
0     307      None   None     None       NaN    
1     307      None   None     None       NaN   
2     307      None   None     None       NaN    
3     307      None   None     None      0.88     
4     307      None   None     None      0.60     
   january  july  june  march   may november october september  year  
0      NaN  None  None    NaN  None     None    None      None  2014  
1      NaN  None  None    NaN  None     None    None      None  2015  
2      NaN  None  None    NaN  None     None    None      None  2016  
3      0.7  None  None    1.1  None     None    None      None  2017  
4      0.5  None  None    NaN  None     None    None      None  2018

It essentially has some values in the month column for a given year for a particular pointId I need to reshape it so that I condense the 12 columns into one date column. This column will have the last date of the month for a given value. So I need to add a row for given value in the months column. The resultant dataframe should look like this:

      pointId     Date         Value
0     307        01/31/2017     0.7
1     307        02/28/2017     0.88
2     307        03/31/2017     1.1
3     307        01/31/2018     0.5
4     686307     02/28/2018     0.6

As usual, thanks for all our help. I wouldn't get by at work without SO :)

By using stack , next step you just need to convert the Year, Month to month end

df.set_index(['pointId','year']).replace('None',np.nan).stack()
Out[1127]: 
pointId  year          
307      2017  february    0.88
               january     0.70
               march       1.10
         2018  february    0.60
               january     0.50
dtype: float64

Update

s=df.set_index(['pointId','year']).replace('None',np.nan).stack().reset_index()

s=s.replace({'february':2,'january':1,'march':3})
from pandas.tseries.offsets import MonthEnd
s['Date']=pd.to_datetime(s.year*10+s.level_2,format='%Y%m')+MonthEnd(1)

s.drop(['year','level_2'],1).rename(columns={0:'Value'})
Out[1143]: 
   pointId  Value       Date
0      307   0.88 2017-02-28
1      307   0.70 2017-01-31
2      307   1.10 2017-03-31
3      307   0.60 2018-02-28
4      307   0.50 2018-01-31

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM