Python Pandas Scientific Notation Iconsistent

Question

I am looking into rewriting some data analysis code using Pandas (since I just discovered it) on Ubuntu 14.04 64-bit and I have hit upon some strange behaviour. My data files look like this:

26/09/2014  00:00:00    2.423009    -58.864655  3.312355E-7 6.257226E-8 302 305
26/09/2014  00:00:00    2.395637    -62.73302   3.321525E-7 7.065322E-8 302 305
26/09/2014  00:00:01    2.332541    -57.763269  3.285718E-7 6.873837E-8 302 305
26/09/2014  00:00:02    2.366828    -51.900812  3.262279E-7 7.397762E-8 302 305
26/09/2014  00:00:03    2.435500    -40.820161  3.241068E-7 6.777224E-8 302 305
26/09/2014  00:00:04    2.428922    -65.573049  3.212358E-7 6.761804E-8 302 305
26/09/2014  00:00:05    2.419931    -59.414711  3.185517E-7 7.243236E-8 302 305
26/09/2014  00:00:06    2.416663    -60.064279  3.209795E-7 6.242328E-8 302 305
26/09/2014  00:00:07    2.411954    -52.586242  3.184297E-7 5.825581E-8 302 304
26/09/2014  00:00:08    2.457342    -61.874388  3.151493E-7 6.327384E-8 303 304

Where columns are tab-separated. In order to read these into Pandas, I am using the following simple commands:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv("path/to/file.dat", sep="\t", header=None)
print data

This produces the following output:

            0         1         2          3  4             5    6    7
0  26/09/2014  00:00:00  2.423009 -58.864655  0  6.257226e-08  302  305
1  26/09/2014  00:00:00  2.395637 -62.733020  0  7.065322e-08  302  305
2  26/09/2014  00:00:01  2.332541 -57.763269  0  6.873837e-08  302  305
3  26/09/2014  00:00:02  2.366828 -51.900812  0  7.397762e-08  302  305
4  26/09/2014  00:00:03  2.435500 -40.820161  0  6.777224e-08  302  305
5  26/09/2014  00:00:04  2.428922 -65.573049  0  6.761804e-08  302  305
6  26/09/2014  00:00:05  2.419931 -59.414711  0  7.243236e-08  302  305
7  26/09/2014  00:00:06  2.416663 -60.064279  0  6.242328e-08  302  305
8  26/09/2014  00:00:07  2.411954 -52.586242  0  5.825581e-08  302  304
9  26/09/2014  00:00:08  2.457342 -61.874388  0  6.327384e-08  303  304

[10 rows x 8 columns]

The important thing to notice here is column 4. Compare it to column 5, and to the original data. Column 5 has been rendered in scientific notation, while column 4 has not. It hasn't just zeroed out the column or converted it to int because:

>>> data[4][0]*1e7
3.3123550000000002

Which is what I would expect. So the data values are the same but the representation has changed. If this is just a cosmetic thing, then I could put up with it, but it makes me feel uneasy and I'd like to know what's going on here.

Answer 1

Yes it's a cosmetic thing, you can change this using set_option :

In [21]:

pd.set_option('display.precision',20)
df[4]
Out[21]:
0    0.0000003312355
1    0.0000003321525
2    0.0000003285718
3    0.0000003262279
4    0.0000003241068
5    0.0000003212358
6    0.0000003185517
7    0.0000003209795
8    0.0000003184297
9    0.0000003151493
Name: 4, dtype: float64

The underlying data will not have been truncated and will be preserved including when you write the data back out to csv

If you are in iPython then you can check what the default settings are, for display precision (significant digits) it is 7 normally.

Python Pandas Scientific Notation Iconsistent

Question

1 answers

solution1
4 ACCPTED 2014-10-20 11:20:18

Python Pandas Scientific Notation Iconsistent

Question

1 answers

solution1 4 ACCPTED 2014-10-20 11:20:18

solution1
4 ACCPTED 2014-10-20 11:20:18