简体   繁体   中英

Pandas: Convert a DataFrame into a Series when index is Year-Month and columns are Day

I have a dataframe that looks similar to the following:

df = pd.DataFrame({'Y_M':['201710','201711','201712'],'1':[1,5,9],'2':[2,6,10],'3':[3,7,11],'4':[4,8,12]})
df = df.set_index('Y_M')

Which creates a dataframe looking like this:

        1  2  3   4
Y_M                
201711  1  2  3  4
201712  5  6  7  8
201713  9  10 11 12

The columns are the day of the month. They stretch on to the right, going all the way up to 31. (February will have columns 29, 30, and 31 filled with NaN). The index contains the year and the month (eg 201711 referring to Nov 2017)

My question is: How can I make this a single series, with the year/month/day combined? My output would be the following:

Y_M                
20171001    1
20171002    2
20171003    3
20171004    4
20171101    5  
20171102    6
20171103    7
20171104    8
20171201    9
20171202   10
20171203   11
20171204   12

The index can be converted to a datetime. In fact I think it would make it easier.

Use stack for Series and then combine datetimes by to_datetime with timedeltas by to_timedelta :

df = df.stack()
df.index = pd.to_datetime(df.index.get_level_values(0), format='%Y%m') + \
           pd.to_timedelta(df.index.get_level_values(1).astype(int) - 1, unit='D') 
print (df)
2017-10-01     1
2017-10-02     2
2017-10-03     3
2017-10-04     4
2017-11-01     5
2017-11-02     6
2017-11-03     7
2017-11-04     8
2017-12-01     9
2017-12-02    10
2017-12-03    11
2017-12-04    12
dtype: int64

print (df.index)
DatetimeIndex(['2017-10-01', '2017-10-02', '2017-10-03', '2017-10-04',
               '2017-11-01', '2017-11-02', '2017-11-03', '2017-11-04',
               '2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04'],
              dtype='datetime64[ns]', freq=None)

Last if necessary strings in index (not DatetimeIndex ) add DatetimeIndex.strftime :

df.index = df.index.strftime('%Y%m%d')
print (df)
20171001     1
20171002     2
20171003     3
20171004     4
20171101     5
20171102     6
20171103     7
20171104     8
20171201     9
20171202    10
20171203    11
20171204    12
dtype: int64

print (df.index)
Index(['20171001', '20171002', '20171003', '20171004', '20171101', '20171102',
       '20171103', '20171104', '20171201', '20171202', '20171203', '20171204'],
      dtype='object')

Without bringing date into it.

s = df.stack()
s.index = s.index.map('{0[0]}{0[1]:>02s}'.format)
s

20171001     1
20171002     2
20171003     3
20171004     4
20171101     5
20171102     6
20171103     7
20171104     8
20171201     9
20171202    10
20171203    11
20171204    12
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM