Pandas将科学记数法中的浮点数转换为字符串

Question

我使用read_csv()来加载看起来像这样的数据集

userid
NaN
1.091178e+11
1.137856e+11

我想将用户ID转换为字符串。 一种解决方案是将keep_default_na=False添加到read_csv() ，这是由SO建议的：将长整数转换为pandas中的字符串（以避免科学记数法）

假设我不想使用keep_default_na=False 。 有没有办法将用户ID列转换为str。

我尝试了df.userid.astype(str)然后我得到了1.091178e+11 。 我期待扩展形式的结果不是科学形式。

我该怎么办？

Answer 1

您可以使用map或apply ，如本评论中所述：

print (df.userid.map(lambda x: '{:.0f}'.format(x)))
0             nan
1    109117800000
2    113785600000
Name: userid, dtype: object

df.userid = df.userid.map(lambda x: '{:.0f}'.format(x))
print (df)
         userid
0           nan
1  109117800000
2  113785600000

我想知道map是否会更快，但它是一样的：

#[300000 rows x 1 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
#print (df)

In [40]: %timeit (df.userid.map(lambda x: '{:.0f}'.format(x)))
1 loop, best of 3: 211 ms per loop

In [41]: %timeit (df.userid.apply(lambda x: '{:.0f}'.format(x)))
1 loop, best of 3: 210 ms per loop

另一个解决方案是to_string ，但它很慢：

print(df.userid.to_string(float_format='{:.0f}'.format))
0            nan
1   109117800000
2   113785600000

In [41]: (df.userid.to_string(float_format='{:.0f}'.format))
1 loop, best of 3: 2.52 s per loop

Answer 2

我在使用read_json方法从json文件读取数据帧后偶然发现了这个问题，遗憾的是它没有keep_default_na参数。

解决方案是将长浮点数转换为np.int64然后再将其转换为str 。

In [53]: tweet_id_sample = tweets.iloc[0]['id']
         tweet_id_sample
Out[53]: 8.924206435553362e+17

In [54]: tweet_id_sample.astype(str)
Out[54]: '8.924206435553362e+17'

In [55]: tweet_id_sample.astype(np.int64).astype(str)
Out[55]: '892420643555336192'

In [56]: # This overflows
         tweet_id_sample.astype(int)
Out[56]: -2147483648

Pandas将科学记数法中的浮点数转换为字符串

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-12-15 07:04:38

解决方案2
1 2018-12-15 20:59:35

Pandas将科学记数法中的浮点数转换为字符串

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-12-15 07:04:38

解决方案2 1 2018-12-15 20:59:35

解决方案1
2 已采纳 2016-12-15 07:04:38

解决方案2
1 2018-12-15 20:59:35