简体   繁体   English

Pandas dataframe with float64 由 to_json() 错误导出 function

[英]Pandas dataframe with float64 incorrectly exported by to_json() function

This issue is about exporting the dataframe with float64 datatype, using the to_json() function from Pandas.此问题是关于使用来自 Pandas 的to_json() function 导出具有 float64 数据类型的 dataframe。 The source code is attached below.源代码附在下面。

import pandas

if __name__ == "__main__":
    d = {'col1': [11111111.84, 123456.55], 'col2': [3, 4]}
    df = pandas.DataFrame(data=d)

    print(df)
    print(df.dtypes)

    output_file_path = '/test.csv'
    df.to_csv(output_file_path, index=False, encoding='UTF-8')
    output_file_path = '/test.json'
    df.to_json(output_file_path, orient="records", lines=True)

The output from print() function is correct before exporting the dataframe into JSON or CSV file. The output from print() function is correct before exporting the dataframe into JSON or CSV file. The output is shown below. output 如下图所示。

          col1  col2
0  11111111.84     3
1    123456.55     4
col1    float64
col2      int64
dtype: object

The exported data in CSV format (test.csv) is correct as it should be.以 CSV 格式 (test.csv) 导出的数据应该是正确的。

在此处输入图像描述

The exported data in JSON format (test.json) has the incorrect decimal points as shown below in col1 row1 (11111111.8399999999) . JSON 格式 (test.json) 导出的数据在 col1 row1 (11111111.8399999999)中的小数点不正确,如下所示。 This issue only occurs for some values because col1 row2 is correct (123456.55) .此问题仅发生在某些值上,因为 col1 row2 是正确的(123456.55)

在此处输入图像描述

I found that there is a workaround to fix this issue by specifying another argument double_precision for to_json() function.我发现有一种解决方法可以通过为to_json() function 指定另一个参数double_precision来解决此问题。 The result becomes correct.结果变得正确。 (already tested.) (已经测试过了。)

Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html参考: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.ZFC35FDC70D5FC69D269883A822C7A5E

However, by specifying double_precision argument, it might limit the number of decimal points to all columns.但是,通过指定double_precision参数,它可能会限制所有列的小数点数。 It is not a good approach when each data column requires different numbers of decimal points.当每个数据列需要不同的小数点时,这不是一个好方法。

Also, found the topic below but not sure if it is related to this issue of mine.另外,找到了下面的主题,但不确定它是否与我的这个问题有关。

Link: What is the difference between NUMERIC and FLOAT in BigQuery?链接: BigQuery 中的 NUMERIC 和 FLOAT 有什么区别?

I am trying to understand the root cause of this issue and look for the solution.我正在尝试了解此问题的根本原因并寻找解决方案。 This is quite weird that the issue only happens for to_json() function, but to_csv() function works.这很奇怪,该问题仅发生在to_json() function 上,但to_csv() function 有效。

Anyone please help!任何人请帮忙!

pandas to_json might be doing something weird with the precision there. pandas to_json可能会在精度方面做一些奇怪的事情。 As you've explained, the canonical solution is to specify double_precision with your desired precision, but this doesn't allow you to selectively round specific columns to a desired precision.正如您所解释的,规范的解决方案是使用您所需的精度指定double_precision ,但这不允许您有选择地将特定列四舍五入到所需的精度。

Another option is to cut out the middleman df.to_json here and instead use python's builtin json.dump :另一种选择是在这里删除中间人df.to_json ,而是使用 python 的内置json.dump

import json

# convert to string
json.dumps(df.to_dict()) 
# '{"col1": {"0": 11111111.84, "1": 123456.55}, "col2": {"0": 3, "1": 4}}'  

# save as a file
json.dump(df.to_dict(), f)  # f is an open fileobj

As you can see, this doesn't muck around with the precision.正如您所看到的,这并没有精确地解决问题。 Standard floating point caveats still apply. 标准浮点警告仍然适用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM