[英]How to remove last the two digits in a column that is of integer type?
How can I remove the last two digits of a DataFrame column of type int64? 如何删除int64类型的DataFrame列的最后两位数?
For example df['DATE']
includes: 例如, df['DATE']
包括:
DATE
20110708
20110709
20110710
20110711
20110712
20110713
20110714
20110815
20110816
20110817
What I would like is: 我想要的是:
DATE
201107
201107
201107
201107
201107
201107
201107
201108
201108
201108
What is the simplest way of achieving this? 实现这一目标的最简单方法是什么?
Convert the dtype to str using astype
then used vectorised str
method to slice the str and then convert back to int64
dtype again: 使用astype
将astype
转换为str,然后使用astype
str
方法对str
进行切片,然后再次转换回int64
dtype:
In [184]:
df['DATE'] = df['DATE'].astype(str).str[:-2].astype(np.int64)
df
Out[184]:
DATE
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108
In [185]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 1 columns):
DATE 10 non-null int64
dtypes: int64(1)
memory usage: 160.0 bytes
Hmm... 嗯...
Turns out there is a built in method floordiv
: 原来有一个内置的方法floordiv
:
In [191]:
df['DATE'].floordiv(100)
Out[191]:
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108
Name: DATE, dtype: int64
update 更新
For a 1000 row df, the floordiv
method is considerably faster: 对于1000行df, floordiv
方法要快得多:
%timeit df['DATE'].astype(str).str[:-2].astype(np.int64)
%timeit df['DATE'].floordiv(100)
100 loops, best of 3: 2.92 ms per loop
1000 loops, best of 3: 203 µs per loop
Here we observe ~10x speedup 在这里我们观察到~10倍的加速
You could use floor division //
to drop the last two digits and preserve the integer type: 您可以使用分区//
删除最后两位数并保留整数类型:
>>> df['DATE'] // 100
DATE
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.