As an example, I'm trying to slice a field, 'date', containing dates in the format YYYYMMDD into 3 individual fields ('year', 'month', 'day').
I have an approach that assigns each value one at a time, but I assume there is a more efficient way to produce the desired result.
Current solution:
df['year'] = df['date'].astype(str).apply(lambda x: x[:4])
df['month'] = df['date'].astype(str).apply(lambda x: x[4:6])
df['day'] = df['date'].astype(str).apply(lambda x: x[6:8])
The following is an example of one of my attempts to simplify the code:
df['year'], df['month'], df['day'] = df['date'].astype(str).apply(lambda x: [x[:4], x[4:6], x[6:8]])
You should always convert date columns to pandas datetime object. You can do this in one of 2 ways: 1. When you impost the date:
df = pd.read_csv('your-raw_date.csv', parse_dates=[5,7])
NOTE: 5 and 7 are the column numbers where the dates are.
to_datetime
to individual column. see this post for details. Second, import a module called datetime
import datetime as dt
once you have those 2 things, you can do things like:
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
you can event do other cool things like
dt['signup_date_dayofweek'] = dt['date'].dt.dayofweek
see full documentation of datatime module
convert the string to datetime and use datetime-type columns' .dt.
attributes
df.date = pd.to_datetime(df.date, format='%Y%m%d')
df['year'], df['month'], df['day'] = df.date.dt.year, df.date.dt.month, df.date.dt.day
after you convert the date from date to datetime, you might find that extracting the year, month & day are not even necessary as the attributes are readily available through the date
column's dt
attributes.
Since it looks like they are already integers, forget about all of the string formatting.
df['year'] = df.date//10**4
df['month'] = df.date//100%100
df['day'] = df.date%100
assuming column date
is of a string ( object
) dtype:
In [18]: df
Out[18]:
date
0 20180131
1 20180611
2 20180513
In [19]: df.dtypes
Out[19]:
date object
dtype: object
solution:
In [22]: df[['year','month','day']] = df.date.str.extract(r'(\d{4})(\d{2})(\d{2})').astype(int)
In [23]: df
Out[23]:
date year month day
0 20180131 2018 1 31
1 20180611 2018 6 11
2 20180513 2018 5 13
In [24]: df.dtypes
Out[24]:
date object
year int32
month int32
day int32
dtype: object
PS if date
is of a numeric dtype, then I'd choose @ALollz's solution...
Here's a functional solution using operator.attrgetter
:
from operator import attrgetter
df = pd.DataFrame({'Date': [20180131, 20180611, 20180513]})
df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d')
parts = ('year', 'month', 'day')
df['Year'], df['Month'], df['Day'] = zip(*df['Date'].apply(attrgetter(*parts)).values)
print(df)
Date Year Month Day
0 2018-01-31 2018 1 31
1 2018-06-11 2018 6 11
2 2018-05-13 2018 5 13
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.