简体   繁体   中英

Slice Pandas series (based on index) into multiple columns of dataframe

As an example, I'm trying to slice a field, 'date', containing dates in the format YYYYMMDD into 3 individual fields ('year', 'month', 'day').

I have an approach that assigns each value one at a time, but I assume there is a more efficient way to produce the desired result.

Current solution:

df['year'] = df['date'].astype(str).apply(lambda x: x[:4])
df['month'] = df['date'].astype(str).apply(lambda x: x[4:6])
df['day'] = df['date'].astype(str).apply(lambda x: x[6:8])

The following is an example of one of my attempts to simplify the code:

df['year'], df['month'], df['day'] = df['date'].astype(str).apply(lambda x: [x[:4], x[4:6], x[6:8]])

You should always convert date columns to pandas datetime object. You can do this in one of 2 ways: 1. When you impost the date:

 df = pd.read_csv('your-raw_date.csv', parse_dates=[5,7])

NOTE: 5 and 7 are the column numbers where the dates are.

  1. You can use a method called to_datetime to individual column. see this post for details.

Second, import a module called datetime

import datetime as dt

once you have those 2 things, you can do things like:

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day

you can event do other cool things like

dt['signup_date_dayofweek'] = dt['date'].dt.dayofweek

see full documentation of datatime module

convert the string to datetime and use datetime-type columns' .dt. attributes

df.date = pd.to_datetime(df.date, format='%Y%m%d')
df['year'], df['month'], df['day'] = df.date.dt.year, df.date.dt.month, df.date.dt.day

after you convert the date from date to datetime, you might find that extracting the year, month & day are not even necessary as the attributes are readily available through the date column's dt attributes.

Since it looks like they are already integers, forget about all of the string formatting.

df['year'] = df.date//10**4
df['month'] = df.date//100%100
df['day'] = df.date%100

assuming column date is of a string ( object ) dtype:

In [18]: df
Out[18]:
       date
0  20180131
1  20180611
2  20180513

In [19]: df.dtypes
Out[19]:
date    object
dtype: object

solution:

In [22]: df[['year','month','day']] = df.date.str.extract(r'(\d{4})(\d{2})(\d{2})').astype(int)

In [23]: df
Out[23]:
       date  year  month  day
0  20180131  2018      1   31
1  20180611  2018      6   11
2  20180513  2018      5   13

In [24]: df.dtypes
Out[24]:
date     object
year      int32
month     int32
day       int32
dtype: object

PS if date is of a numeric dtype, then I'd choose @ALollz's solution...

Here's a functional solution using operator.attrgetter :

from operator import attrgetter

df = pd.DataFrame({'Date': [20180131, 20180611, 20180513]})

df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d')

parts = ('year', 'month', 'day')

df['Year'], df['Month'], df['Day'] = zip(*df['Date'].apply(attrgetter(*parts)).values)

print(df)

        Date  Year  Month  Day
0 2018-01-31  2018      1   31
1 2018-06-11  2018      6   11
2 2018-05-13  2018      5   13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM