从 pandas.core.series.Series 中删除前导零

Question

我有一个带有数据的 pandas.core.series.Series

0    [00115840, 00110005, 001000033, 00116000...
1    [00267285, 00263627, 00267010, 0026513...
2                             [00335595, 00350750]

我想从系列中删除前导零。我试过了

x.astype('int64')

但收到错误信息

ValueError: setting an array element with a sequence.

你能建议我如何在 python 3.x 中做到这一点吗？

Answer 1

如果要将string列表转换为integers列表，请使用list comprehension ：

s = pd.Series([[int(y) for y in x] for x in s], index=s.index)

s = s.apply(lambda x: [int(y) for y in x])

样本：

a = [['00115840', '00110005', '001000033', '00116000'],
     ['00267285', '00263627', '00267010', '0026513'],
     ['00335595', '00350750']]

s = pd.Series(a)
print (s)
0    [00115840, 00110005, 001000033, 00116000]
1      [00267285, 00263627, 00267010, 0026513]
2                         [00335595, 00350750]
dtype: object

s = s.apply(lambda x: [int(y) for y in x])
print (s)
0    [115840, 110005, 1000033, 116000]
1      [267285, 263627, 267010, 26513]
2                     [335595, 350750]
dtype: object

编辑：

如果只需要integer s，您可以展平值并转换为int s：

s = pd.Series([item for sublist in s for item in sublist]).astype(int)

替代解决方案：

import itertools
s = pd.Series(list(itertools.chain(*s))).astype(int)

print (s)
0     115840
1     110005
2    1000033
3     116000
4     267285
5     263627
6     267010
7      26513
8     335595
9     350750
dtype: int32

时间：

a = [['00115840', '00110005', '001000033', '00116000'],
     ['00267285', '00263627', '00267010', '0026513'],
     ['00335595', '00350750']]

s = pd.Series(a)
s = pd.concat([s]*1000).reset_index(drop=True)

In [203]: %timeit pd.Series([[int(y) for y in x] for x in s], index=s.index)
100 loops, best of 3: 4.66 ms per loop

In [204]: %timeit s.apply(lambda x: [int(y) for y in x])
100 loops, best of 3: 5.13 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ sol
In [205]: %%timeit
     ...: v = pd.Series(np.concatenate(s.values.tolist()))
     ...: v.astype(int).groupby(s.index.repeat(s.str.len())).agg(pd.Series.tolist)
     ...: 
1 loop, best of 3: 226 ms per loop

#Wen solution
In [211]: %timeit pd.Series(s.apply(pd.Series).stack().astype(int).groupby(level=0).apply(list))
1 loop, best of 3: 1.12 s per loop

具有展平的解决方案（@cᴏʟᴅsᴘᴇᴇᴅ的想法）：

In [208]: %timeit pd.Series([item for sublist in s for item in sublist]).astype(int)
100 loops, best of 3: 2.55 ms per loop

In [209]: %timeit pd.Series(list(itertools.chain(*s))).astype(int)
100 loops, best of 3: 2.2 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ sol
In [210]: %timeit pd.Series(np.concatenate(s.values.tolist()))
100 loops, best of 3: 7.71 ms per loop

Answer 2

s=pd.Series(s.apply(pd.Series).astype(int).values.tolist())
s
Out[282]: 
0    [1, 2]
1    [3, 4]
dtype: object

数据输入

s=pd.Series([['001','002'],['003','004']])

更新：感谢 Jez 并指出它:-)

pd.Series(s.apply(pd.Series).stack().astype(int).groupby(level=0).apply(list))
Out[317]: 
0    [115840, 110005, 1000033, 116000]
1      [267285, 263627, 267010, 26513]
2                     [335595, 350750]
dtype: object

Answer 3

使用np.concatenate平您的数据 -

s

0    [00115840, 36869, 262171, 39936]
1     [00267285, 92055, 93704, 11595]
2                  [00335595, 119272]
Name: 1, dtype: object

v = pd.Series(np.concatenate(s.tolist()))

或者（感谢 jezrael 的建议），使用速度更快的.values.tolist -

v = pd.Series(np.concatenate(s.values.tolist()))

v

0    00115840
1       36869
2      262171
3       39936
4    00267285
5       92055
6       93704
7       11595
8    00335595
9      119272
dtype: object

现在，你用astype做的astype应该有效 -

v.astype(int)

0    115840
1     36869
2    262171
3     39936
4    267285
5     92055
6     93704
7     11595
8    335595
9    119272
dtype: int64

如果您将数据作为浮点数，请改用astype(float) 。

如果你愿意，你可以使用groupby + agg将结果重塑回其原始格式 -

v.astype(int).groupby(s.index.repeat(s.str.len())).agg(pd.Series.tolist)

0    [115840, 36869, 262171, 39936]
1     [267285, 92055, 93704, 11595]
2                  [335595, 119272]
dtype: object

Answer 4

如果您想要更清晰的解决方案，您可以尝试以下操作：假设 a 是原始系列。

b = a.explode().astype(int)
a = b.groupby(b.index).agg(list)

尽管这比@cs95 和@jezrael 发布的解决方案慢

Answer 5

#where x is a series
x = x.str.lstrip('0')

Answer 6

如果您有混合 dtype，下面的行应该可以工作

df['col'] = df['col'].apply(lambda x:x.lstrip('0') if type(x) == str else x)

从 pandas.core.series.Series 中删除前导零

问题描述

6 个解决方案

解决方案1
4 已采纳 2018-01-07 16:39:04

解决方案2
4 2018-01-07 16:46:28

解决方案3
2 2018-01-07 16:47:00

解决方案4
1 2020-08-30 04:23:35

解决方案5
1 2021-07-26 15:24:10

解决方案6
0 2020-08-30 04:06:21

从 pandas.core.series.Series 中删除前导零

问题描述

6 个解决方案

解决方案1 4 已采纳 2018-01-07 16:39:04

解决方案2 4 2018-01-07 16:46:28

解决方案3 2 2018-01-07 16:47:00

解决方案4 1 2020-08-30 04:23:35

解决方案5 1 2021-07-26 15:24:10

解决方案6 0 2020-08-30 04:06:21

解决方案1
4 已采纳 2018-01-07 16:39:04

解决方案2
4 2018-01-07 16:46:28

解决方案3
2 2018-01-07 16:47:00

解决方案4
1 2020-08-30 04:23:35

解决方案5
1 2021-07-26 15:24:10

解决方案6
0 2020-08-30 04:06:21