简体   繁体   English

Python Pandas科学符号图标

[英]Python Pandas Scientific Notation Iconsistent

I am looking into rewriting some data analysis code using Pandas (since I just discovered it) on Ubuntu 14.04 64-bit and I have hit upon some strange behaviour. 我正在研究在64位Ubuntu 14.04上使用Pandas重写一些数据分析代码(因为我刚刚发现它),所以遇到了一些奇怪的行为。 My data files look like this: 我的数据文件如下所示:

26/09/2014  00:00:00    2.423009    -58.864655  3.312355E-7 6.257226E-8 302 305
26/09/2014  00:00:00    2.395637    -62.73302   3.321525E-7 7.065322E-8 302 305
26/09/2014  00:00:01    2.332541    -57.763269  3.285718E-7 6.873837E-8 302 305
26/09/2014  00:00:02    2.366828    -51.900812  3.262279E-7 7.397762E-8 302 305
26/09/2014  00:00:03    2.435500    -40.820161  3.241068E-7 6.777224E-8 302 305
26/09/2014  00:00:04    2.428922    -65.573049  3.212358E-7 6.761804E-8 302 305
26/09/2014  00:00:05    2.419931    -59.414711  3.185517E-7 7.243236E-8 302 305
26/09/2014  00:00:06    2.416663    -60.064279  3.209795E-7 6.242328E-8 302 305
26/09/2014  00:00:07    2.411954    -52.586242  3.184297E-7 5.825581E-8 302 304
26/09/2014  00:00:08    2.457342    -61.874388  3.151493E-7 6.327384E-8 303 304

Where columns are tab-separated. 列用制表符分隔的位置。 In order to read these into Pandas, I am using the following simple commands: 为了将它们读入Pandas,我使用以下简单命令:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv("path/to/file.dat", sep="\t", header=None)
print data

This produces the following output: 这将产生以下输出:

            0         1         2          3  4             5    6    7
0  26/09/2014  00:00:00  2.423009 -58.864655  0  6.257226e-08  302  305
1  26/09/2014  00:00:00  2.395637 -62.733020  0  7.065322e-08  302  305
2  26/09/2014  00:00:01  2.332541 -57.763269  0  6.873837e-08  302  305
3  26/09/2014  00:00:02  2.366828 -51.900812  0  7.397762e-08  302  305
4  26/09/2014  00:00:03  2.435500 -40.820161  0  6.777224e-08  302  305
5  26/09/2014  00:00:04  2.428922 -65.573049  0  6.761804e-08  302  305
6  26/09/2014  00:00:05  2.419931 -59.414711  0  7.243236e-08  302  305
7  26/09/2014  00:00:06  2.416663 -60.064279  0  6.242328e-08  302  305
8  26/09/2014  00:00:07  2.411954 -52.586242  0  5.825581e-08  302  304
9  26/09/2014  00:00:08  2.457342 -61.874388  0  6.327384e-08  303  304

[10 rows x 8 columns]

The important thing to notice here is column 4. Compare it to column 5, and to the original data. 这里要注意的重要事项是第4列。将其与第5列以及原始数据进行比较。 Column 5 has been rendered in scientific notation, while column 4 has not. 第5列以科学计数法表示,而第4列则没有。 It hasn't just zeroed out the column or converted it to int because: 它不只是将列清零或将其转换为int,原因是:

>>> data[4][0]*1e7
3.3123550000000002

Which is what I would expect. 这是我所期望的。 So the data values are the same but the representation has changed. 因此,数据值相同,但表示形式已更改。 If this is just a cosmetic thing, then I could put up with it, but it makes me feel uneasy and I'd like to know what's going on here. 如果这只是装饰性的事情,那么我可以忍受,但是这让我感到不安,我想知道这里发生了什么。

Yes it's a cosmetic thing, you can change this using set_option : 是的,这是一件装饰性的事情,您可以使用set_option进行更改:

In [21]:

pd.set_option('display.precision',20)
df[4]
Out[21]:
0    0.0000003312355
1    0.0000003321525
2    0.0000003285718
3    0.0000003262279
4    0.0000003241068
5    0.0000003212358
6    0.0000003185517
7    0.0000003209795
8    0.0000003184297
9    0.0000003151493
Name: 4, dtype: float64

The underlying data will not have been truncated and will be preserved including when you write the data back out to csv 基础数据不会被截断,并且会保留下来,包括当您将数据写回到csv时

If you are in iPython then you can check what the default settings are, for display precision (significant digits) it is 7 normally. 如果您使用的是iPython,则可以检查默认设置,对于显示精度(有效数字),通常为7。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM