[英]How to extract any subset of digits from a numeric column
I have an integer column(int64) in a dataframe with values as 20170811. (yyyymmdd) Now i need to extract 08 only and store it as a new column. 我在数据框中有一个整数列(int64),其值为20170811(yyyymmdd)现在,我只需要提取08并将其存储为新列。
df['key'].floordiv(10000)
gives me 2017. But i wanted 08 - how to get it ? 给了我2017年。但是我想要08-如何获得?
New Answer (as requested in comments) 新答案 (按评论要求)
Converting to Datetime
has several advantages if you want to format your datetime string. 如果要格式化日期时间字符串,则转换为
Datetime
有几个优点。 In order to do so you can use DataFrame.dt.strftime
. 为此,您可以使用
DataFrame.dt.strftime
。 For more information on how to format strings and create custmo formats take a look at this . 有关如何设置字符串格式和创建custmo格式的更多信息,请查看此内容 。
import pandas as pd
df = pd.DataFrame({'key': ['20181201', '20180302', '20180403']})
pd.to_datetime(df.key).dt.strftime('%b')
This will output: 这将输出:
0 Dec
1 Mar
2 Apr
Name: key, dtype: object
Old Answer 旧答案
What you can do is take the module of the value, divide it by 100 and drop the decimals: 您可以做的是获取值的模块,将其除以100,然后减去小数:
import pandas as pd
df = pd.DataFrame({'key': ['20181201', '20180302', '20180403']}).astype(int)
df['key'].map(lambda x: int((x % 10000) / 100))
Which outputs: 哪个输出:
0 12
1 3
2 4
Name: key, dtype: int64
In case you have strings, you could convert it to Datetime
objects and simply access month
: 如果您有字符串,则可以将其转换为
Datetime
对象,只需访问month
:
import pandas as pd
df = pd.DataFrame({'key': ['20181201', '20180302', '20180403']})
pd.to_datetime(df.key).map(lambda x: x.month)
giving you the same output. 给你相同的输出。
Or use: 或使用:
df['key']=df.astype(str)['key'].map(lambda x: x[4:6]).astype(int)
As you said to @Stefan, do: 正如您对@Stefan所说的那样:
import calendar
df['key']=df.astype(str)['key'].map(lambda x: x[4:6]).astype(int).apply(lambda x: calendar.month_name[int(x)-1])
Or apply
: 或
apply
:
df['key']=df.astype(str)['key'].apply(lambda x: x[4:6]).astype(int)
As you said to @Stefan, do: 正如您对@Stefan所说的那样:
import calendar
df['key']=df.astype(str)['key'].apply(lambda x: x[4:6]).astype(int).apply(lambda x: calendar.month_name[int(x)-1])
Probably the most robust way is: 最可靠的方法可能是:
import datetime
import pandas as pd
df = pd.DataFrame({'date': [20180201, 20180302, 20180403]})
df['month'] = pd.to_datetime(df['date'].astype(str), format='%Y%m%d').map(lambda x: x.strftime('%m'))
If you wanted the df['month']
to be integer, just cast it with col.astype(int)
. 如果您希望
df['month']
为整数,则可以使用col.astype(int)
进行col.astype(int)
。
Edit: If you wanted your month in a format of Apr, May, etc., use x.strftime('%b')
. 编辑:如果您希望使用4月,5月等格式的月份,请使用
x.strftime('%b')
。 You may want to look at strftime documentation . 您可能需要查看strftime文档 。
You can convert your series to a string and then slice using Pandas str
methods: 您可以将系列转换为字符串,然后使用Pandas
str
方法进行切片:
df = pd.DataFrame({'date': [20180201, 20180302, 20180403]})
df['key'] = df['date'].astype(str).str[4:6]
print(df)
date key
0 20180201 02
1 20180302 03
2 20180403 04
A much better alternative is to convert to datetime
and extract months as integers: 更好的选择是转换为
datetime
并将月份提取为整数:
df['key'] = pd.to_datetime(df['date'].astype(str)).dt.month
print(df)
date key
0 20180201 2
1 20180302 3
2 20180403 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.