简体   繁体   English

如何从数字列中提取数字的任何子集

[英]How to extract any subset of digits from a numeric column

I have an integer column(int64) in a dataframe with values as 20170811. (yyyymmdd) Now i need to extract 08 only and store it as a new column. 我在数据框中有一个整数列(int64),其值为20170811(yyyymmdd)现在,我只需要提取08并将其存储为新列。

df['key'].floordiv(10000) 

gives me 2017. But i wanted 08 - how to get it ? 给了我2017年。但是我想要08-如何获得?

New Answer (as requested in comments) 新答案 (按评论要求)

Converting to Datetime has several advantages if you want to format your datetime string. 如果要格式化日期时间字符串,则转换为Datetime有几个优点。 In order to do so you can use DataFrame.dt.strftime . 为此,您可以使用DataFrame.dt.strftime For more information on how to format strings and create custmo formats take a look at this . 有关如何设置字符串格式和创建custmo格式的更多信息,请查看此内容

import pandas as pd

df = pd.DataFrame({'key': ['20181201', '20180302', '20180403']})

pd.to_datetime(df.key).dt.strftime('%b')

This will output: 这将输出:

0    Dec
1    Mar
2    Apr
Name: key, dtype: object

Old Answer 旧答案

What you can do is take the module of the value, divide it by 100 and drop the decimals: 您可以做的是获取值的模块,将其除以100,然后减去小数:

import pandas as pd

df = pd.DataFrame({'key': ['20181201', '20180302', '20180403']}).astype(int)

df['key'].map(lambda x: int((x % 10000) / 100))

Which outputs: 哪个输出:

0    12
1     3
2     4
Name: key, dtype: int64

In case you have strings, you could convert it to Datetime objects and simply access month : 如果您有字符串,则可以将其转换为Datetime对象,只需访问month

import pandas as pd

df = pd.DataFrame({'key': ['20181201', '20180302', '20180403']})

pd.to_datetime(df.key).map(lambda x: x.month)

giving you the same output. 给你相同的输出。


Or use: 或使用:

df['key']=df.astype(str)['key'].map(lambda x: x[4:6]).astype(int)

As you said to @Stefan, do: 正如您对@Stefan所说的那样:

import calendar
df['key']=df.astype(str)['key'].map(lambda x: x[4:6]).astype(int).apply(lambda x: calendar.month_name[int(x)-1])

Or apply : apply

df['key']=df.astype(str)['key'].apply(lambda x: x[4:6]).astype(int)

As you said to @Stefan, do: 正如您对@Stefan所说的那样:

import calendar
df['key']=df.astype(str)['key'].apply(lambda x: x[4:6]).astype(int).apply(lambda x: calendar.month_name[int(x)-1])

Probably the most robust way is: 最可靠的方法可能是:

import datetime
import pandas as pd

df = pd.DataFrame({'date': [20180201, 20180302, 20180403]})
df['month'] = pd.to_datetime(df['date'].astype(str), format='%Y%m%d').map(lambda x: x.strftime('%m'))

If you wanted the df['month'] to be integer, just cast it with col.astype(int) . 如果您希望df['month']为整数,则可以使用col.astype(int)进行col.astype(int)

Edit: If you wanted your month in a format of Apr, May, etc., use x.strftime('%b') . 编辑:如果您希望使用4月,5月等格式的月份,请使用x.strftime('%b') You may want to look at strftime documentation . 您可能需要查看strftime文档

You can convert your series to a string and then slice using Pandas str methods: 您可以将系列转换为字符串,然后使用Pandas str方法进行切片:

df = pd.DataFrame({'date': [20180201, 20180302, 20180403]})

df['key'] = df['date'].astype(str).str[4:6]

print(df)

       date key
0  20180201  02
1  20180302  03
2  20180403  04

A much better alternative is to convert to datetime and extract months as integers: 更好的选择是转换为datetime并将月份提取为整数:

df['key'] = pd.to_datetime(df['date'].astype(str)).dt.month

print(df)

       date  key
0  20180201    2
1  20180302    3
2  20180403    4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM