I would like to modify the data frame I am creating below:
from datetime import date
from dateutil.rrule import rrule, DAILY, YEARLY
from dateutil.relativedelta import *
import pandas
START_YR = 2010
END_YR = 2013
strt_date = datetime.date(START_YR, 1, 1)
end_date = datetime.date(END_YR, 12, 31)
dt = rrule(DAILY, dtstart=strt_date, until=end_date)
serie_1 = pandas.Series(np.random.randn(dt.count()), \
index = pandas.date_range(strt_date, end_date))
How can I create a dataframe with year month and date as separate columns?
Convert the series to a DataFrame and then add the new columns as Pandas periods. If you just want the month as an integer, see the 'month_int' example.
df = pd.DataFrame(serie_1)
df['month'] = [ts.to_period('M') for ts in df.index]
df['year'] = [ts.to_period('Y') for ts in df.index]
df['month_int'] = [ts.month for ts in df.index]
>>> df
Out[16]:
0 month year month_int
2010-01-01 0.332370 2010-01 2010 1
2010-01-02 -0.036814 2010-01 2010 1
2010-01-03 1.751511 2010-01 2010 1
... ... ... ... ...
2013-12-29 0.345707 2013-12 2013 12
2013-12-30 -0.395924 2013-12 2013 12
2013-12-31 -0.614565 2013-12 2013 12
It will be significantly faster to just access the datetime attributes:
df['date'] = df.index.date
df['year'] = df.index.year
df['month'] = df.index.month
compare the timings with the list comprehension method:
In [25]:
%%timeit
df['month'] = [ts.to_period('M') for ts in df.index]
df['year'] = [ts.to_period('Y') for ts in df.index]
df['month_int'] = [ts.month for ts in df.index]
1 loops, best of 3: 664 ms per loop
In [26]:
%%timeit
df['date'] = df.index.date
df['year'] = df.index.year
df['month'] = df.index.month
100 loops, best of 3: 5.96 ms per loop
So using the datetime properties is over 100X faster
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.