Slice Pandas series (based on index) into multiple columns of dataframe

Question

As an example, I'm trying to slice a field, 'date', containing dates in the format YYYYMMDD into 3 individual fields ('year', 'month', 'day').

I have an approach that assigns each value one at a time, but I assume there is a more efficient way to produce the desired result.

Current solution:

df['year'] = df['date'].astype(str).apply(lambda x: x[:4])
df['month'] = df['date'].astype(str).apply(lambda x: x[4:6])
df['day'] = df['date'].astype(str).apply(lambda x: x[6:8])

The following is an example of one of my attempts to simplify the code:

df['year'], df['month'], df['day'] = df['date'].astype(str).apply(lambda x: [x[:4], x[4:6], x[6:8]])

Answer 1

You should always convert date columns to pandas datetime object. You can do this in one of 2 ways: 1. When you impost the date:

 df = pd.read_csv('your-raw_date.csv', parse_dates=[5,7])

NOTE: 5 and 7 are the column numbers where the dates are.

You can use a method called to_datetime to individual column. see this post for details.

Second, import a module called datetime

import datetime as dt

once you have those 2 things, you can do things like:

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day

you can event do other cool things like

dt['signup_date_dayofweek'] = dt['date'].dt.dayofweek

see full documentation of datatime module

Answer 2

convert the string to datetime and use datetime-type columns' .dt. attributes

df.date = pd.to_datetime(df.date, format='%Y%m%d')
df['year'], df['month'], df['day'] = df.date.dt.year, df.date.dt.month, df.date.dt.day

after you convert the date from date to datetime, you might find that extracting the year, month & day are not even necessary as the attributes are readily available through the date column's dt attributes.

Answer 3

Since it looks like they are already integers, forget about all of the string formatting.

df['year'] = df.date//10**4
df['month'] = df.date//100%100
df['day'] = df.date%100

Answer 4

assuming column date is of a string ( object ) dtype:

In [18]: df
Out[18]:
       date
0  20180131
1  20180611
2  20180513

In [19]: df.dtypes
Out[19]:
date    object
dtype: object

solution:

In [22]: df[['year','month','day']] = df.date.str.extract(r'(\d{4})(\d{2})(\d{2})').astype(int)

In [23]: df
Out[23]:
       date  year  month  day
0  20180131  2018      1   31
1  20180611  2018      6   11
2  20180513  2018      5   13

In [24]: df.dtypes
Out[24]:
date     object
year      int32
month     int32
day       int32
dtype: object

PS if date is of a numeric dtype, then I'd choose @ALollz's solution...

Answer 5

Here's a functional solution using operator.attrgetter :

from operator import attrgetter

df = pd.DataFrame({'Date': [20180131, 20180611, 20180513]})

df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d')

parts = ('year', 'month', 'day')

df['Year'], df['Month'], df['Day'] = zip(*df['Date'].apply(attrgetter(*parts)).values)

print(df)

        Date  Year  Month  Day
0 2018-01-31  2018      1   31
1 2018-06-11  2018      6   11
2 2018-05-13  2018      5   13

Slice Pandas series (based on index) into multiple columns of dataframe

Question

5 answers

solution1
2 2018-06-11 20:54:28

solution2
1 2018-06-11 20:41:39

solution3
1 2018-06-11 20:45:59

solution4
1 ACCPTED 2018-06-11 20:48:42

solution5
1 2018-06-11 21:54:28

Slice Pandas series (based on index) into multiple columns of dataframe

Question

5 answers

solution1 2 2018-06-11 20:54:28

solution2 1 2018-06-11 20:41:39

solution3 1 2018-06-11 20:45:59

solution4 1 ACCPTED 2018-06-11 20:48:42

solution5 1 2018-06-11 21:54:28

solution1
2 2018-06-11 20:54:28

solution2
1 2018-06-11 20:41:39

solution3
1 2018-06-11 20:45:59

solution4
1 ACCPTED 2018-06-11 20:48:42

solution5
1 2018-06-11 21:54:28