I have a dataframe that looks similar to the following:
df = pd.DataFrame({'Y_M':['201710','201711','201712'],'1':[1,5,9],'2':[2,6,10],'3':[3,7,11],'4':[4,8,12]})
df = df.set_index('Y_M')
Which creates a dataframe looking like this:
1 2 3 4
Y_M
201711 1 2 3 4
201712 5 6 7 8
201713 9 10 11 12
The columns are the day of the month. They stretch on to the right, going all the way up to 31. (February will have columns 29, 30, and 31 filled with NaN). The index contains the year and the month (eg 201711 referring to Nov 2017)
My question is: How can I make this a single series, with the year/month/day combined? My output would be the following:
Y_M
20171001 1
20171002 2
20171003 3
20171004 4
20171101 5
20171102 6
20171103 7
20171104 8
20171201 9
20171202 10
20171203 11
20171204 12
The index can be converted to a datetime. In fact I think it would make it easier.
Use stack
for Series
and then combine datetimes
by to_datetime
with timedeltas
by to_timedelta
:
df = df.stack()
df.index = pd.to_datetime(df.index.get_level_values(0), format='%Y%m') + \
pd.to_timedelta(df.index.get_level_values(1).astype(int) - 1, unit='D')
print (df)
2017-10-01 1
2017-10-02 2
2017-10-03 3
2017-10-04 4
2017-11-01 5
2017-11-02 6
2017-11-03 7
2017-11-04 8
2017-12-01 9
2017-12-02 10
2017-12-03 11
2017-12-04 12
dtype: int64
print (df.index)
DatetimeIndex(['2017-10-01', '2017-10-02', '2017-10-03', '2017-10-04',
'2017-11-01', '2017-11-02', '2017-11-03', '2017-11-04',
'2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04'],
dtype='datetime64[ns]', freq=None)
Last if necessary strings
in index
(not DatetimeIndex
) add DatetimeIndex.strftime
:
df.index = df.index.strftime('%Y%m%d')
print (df)
20171001 1
20171002 2
20171003 3
20171004 4
20171101 5
20171102 6
20171103 7
20171104 8
20171201 9
20171202 10
20171203 11
20171204 12
dtype: int64
print (df.index)
Index(['20171001', '20171002', '20171003', '20171004', '20171101', '20171102',
'20171103', '20171104', '20171201', '20171202', '20171203', '20171204'],
dtype='object')
Without bringing date
into it.
s = df.stack()
s.index = s.index.map('{0[0]}{0[1]:>02s}'.format)
s
20171001 1
20171002 2
20171003 3
20171004 4
20171101 5
20171102 6
20171103 7
20171104 8
20171201 9
20171202 10
20171203 11
20171204 12
dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.