简体   繁体   English

如何防止csv.DictWriter()或writerow()舍入我的花车?

[英]How can I prevent csv.DictWriter() or writerow() rounding my floats?

I have a dictionary that I want to write to a csv file, but the floats in the dictionary are rounded off when I write them to the file. 我有一个字典,我想写一个csv文件,但是当我把它们写入文件时,字典中的浮点数会四舍五入。 I want to keep the maximum precision. 我想保持最高精度。

Where does the rounding occur and how can I prevent it? 四舍五入发生在哪里,我该如何防止它?

What I did 我做了什么

I followed the DictWriter example here and I'm running Python 2.6.1 on Mac (10.6 - Snow Leopard). 我在这里遵循了DictWriter示例 ,我在Mac上运行Python 2.6.1(10.6 - Snow Leopard)。


# my import statements
import sys
import csv

Here is what my dictionary (d) contains: 这是我的字典(d)包含的内容:

>>> d = runtime.__dict__
>>> d
{'time_final': 1323494016.8556759,
'time_init': 1323493818.0042379,
'time_lapsed': 198.85143804550171}

The values are indeed floats: 这些值确实是浮点数:

>>> type(runtime.time_init)
<type 'float'>

Then I setup my writer and write the header and values: 然后我设置我的编写器并编写标题和值:

f = open(log_filename,'w')
fieldnames = ('time_init', 'time_final', 'time_lapsed')
myWriter = csv.DictWriter(f, fieldnames=fieldnames)
headers = dict( (n,n) for n in fieldnames )
myWriter.writerow(headers)
myWriter.writerow(d)
f.close()

But when I look in the output file, I get rounded numbers (ie, floats): 但是当我查看输出文件时,我得到了舍入的数字(即浮点数):

time_init,time_final,time_lapsed
1323493818.0,1323494016.86,198.851438046

< EOF > <EOF>

It looks like csv is using float.__str__ rather than float.__repr__ : 看起来csv使用float .__ str__而不是float .__ repr__

>>> print repr(1323494016.855676)
1323494016.855676
>>> print str(1323494016.855676)
1323494016.86

Looking at the csv source , this appears to be a hardwired behavior. 查看csv源代码 ,这似乎是一种硬连线行为。 A workaround is to cast all of the float values to their repr before csv gets to it. 解决方法是在csv到达之前将所有浮点值强制转换为其repr。 Use something like: d = dict((k, repr(v)) for k, v in d.items()) . 使用类似: d = dict((k, repr(v)) for k, v in d.items())

Here's a worked-out example: 这是一个经过实践证明的例子:

import sys, csv

d = {'time_final': 1323494016.8556759,
     'time_init': 1323493818.0042379,
     'time_lapsed': 198.85143804550171
}

d = dict((k, repr(v)) for k, v in d.items())

fieldnames = ('time_init', 'time_final', 'time_lapsed')
myWriter = csv.DictWriter(sys.stdout, fieldnames=fieldnames)
headers = dict( (n,n) for n in fieldnames )
myWriter.writerow(headers)
myWriter.writerow(d)

This code produces the following output: 此代码生成以下输出:

time_init,time_final,time_lapsed
1323493818.0042379,1323494016.8556759,198.85143804550171

A more refined approach will take care to only make replacements for floats: 更精细的方法将只关注浮动的替换:

d = dict((k, (repr(v) if isinstance(v, float) else str(v))) for k, v in d.items())

Note, I've just fixed this issue for Py2.7.3, so it shouldn't be a problem in the future. 注意,我刚刚为Py2.7.3解决了这个问题,所以它不应该成为一个问题。 See http://hg.python.org/cpython/rev/bf7329190ca6 http://hg.python.org/cpython/rev/bf7329190ca6

It's a known bug^H^H^Hfeature. 这是一个已知的错误^ H ^ H ^ Hfeature。 According to the docs : 根据文件

"""... the value None is written as the empty string. [snip] All other non-string data are stringified with str() before being written.""" msgstr“”“...值None被写为空字符串。[snip]所有其他非字符串数据在写入之前用str()进行字符串化。”“

Don't rely on the default conversions. 不要依赖默认转换。 Use repr() for floats. 使用repr()表示浮点数。 unicode objects need special handling; unicode对象需要特殊处理; see the manual. 见手册。 Check whether the consumer of the file will accept the default format of datetime.x objects for x in (datetime, date, time, timedelta). 检查文件的使用者是否接受x in的datetime.x对象的默认格式(datetime,date,time,timedelta)。

Update : 更新

For float objects, "%f" % value is not a good substitute for repr(value) . 对于float对象, "%f" % value 不能代替repr(value) The criterion is whether the consumer of the file can reproduce the original float object. 标准是文件的使用者是否可以重现原始浮动对象。 repr(value) guarantees this. repr(value)保证这一点。 "%f" % value doesn't. "%f" % value没有。

# Python 2.6.6
>>> nums = [1323494016.855676, 1323493818.004238, 198.8514380455017, 1.0 / 3]
>>> for v in nums:
...     rv = repr(v)
...     fv = "%f" % v
...     sv = str(v)
...     print rv, float(rv) == v, fv, float(fv) == v, sv, float(sv) == v
...
1323494016.8556759 True 1323494016.855676 True 1323494016.86 False
1323493818.0042379 True 1323493818.004238 True 1323493818.0 False
198.85143804550171 True 198.851438 False 198.851438046 False
0.33333333333333331 True 0.333333 False 0.333333333333 False

Notice that in the above, it appears by inspection of the strings produced that none of the %f cases worked. 请注意,在上述结构中, 似乎通过产生的无的弦的检查%f的情况下工作。 Before 2.7, Python's repr always used 17 significant decimal digits. 在2.7之前,Python的repr总是使用17个有效十进制数字。 In 2.7, this was changed to using the minimum number of digits that still guaranteed float(repr(v)) == v . 在2.7中,这被改为使用仍然保证float(repr(v)) == v的最小位数。 The difference is not a rounding error. 差异不是舍入误差。

# Python 2.7 output
1323494016.855676 True 1323494016.855676 True 1323494016.86 False
1323493818.004238 True 1323493818.004238 True 1323493818.0 False
198.8514380455017 True 198.851438 False 198.851438046 False
0.3333333333333333 True 0.333333 False 0.333333333333 False

Note the improved repr() results in the first column above. 请注意,改进的repr()导致上面的第一列。

Update 2 in response to comment """And thanks for the info on Python 2.7. Unfortunately, I'm limited to 2.6.2 (running on the destination machine which can't be upgraded). But I'll keep this in mind for future scripts. """ 更新2以回应评论“”“并感谢Python 2.7上的信息。不幸的是,我只限于2.6.2(在无法升级的目标机器上运行)。但我会记住这一点为将来的脚本。“”“

It doesn't matter. 没关系。 float('0.3333333333333333') == float('0.33333333333333331') produces True on all versions of Python. float('0.3333333333333333') == float('0.33333333333333331')在所有版本的Python上生成True This means that you could write your file on 2.7 and it would read the same on 2.6, or vice versa. 这意味着您可以在2.7上编写文件,在2.6上读取相同的文件,反之亦然。 There is no change in the accuracy of what repr(a_float_object) produces. repr(a_float_object)产生的准确性没有变化。

This works but it is probably not the best/most efficient way: 这可行,但它可能不是最好/最有效的方式:

>>> f = StringIO()
>>> w = csv.DictWriter(f,fieldnames=headers)
>>> w.writerow(dict((k,"%f"%d[k]) for k in d.keys()))
>>> f.getvalue()
'1323493818.004238,1323494016.855676,198.851438\r\n'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM