简体   繁体   中英

Scientific Notation Matplotlib / Pandas

I have a CSV-file with about 28 columns and 4000 rows. From two of these columns i want to plot about 50 specific rows. I used pandas to select this part of the file, but i cannot figure out, how it reads the scientific numbers in a right way.

My code:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("20180416309.csv", sep=";")

x = df.loc[df[u'run#'] == 3, [u'     Diameter']].values
y = df.loc[df[u'run#'] == 3, [u'      dN/dlnD']].values

plt.plot(x, y)
plt.show

So, i am trying to plot the columns u' Diameter' and u' dN/dlnD' when in column u'run#' displays the number 3. Typing "x" or "y" in the IPython console, the right numbers are given.

Unfortunately, the plot looks like this:

As you can see, the decimal power of the scientific notation of these numbers on the y-axis is ignored. How can i fix this? This is my first try using matplotlib and pandas, so please excuse my beginner question.

Edit:

The file´s data looks like this:

run#;     Diameter;      dN/dlnD;
12; +3,58151E+01; +1,17336E+03;
13; +3,26913E+01; +6,06044E+03;
13; +2,98524E+01; +1,76516E+04;
13; +2,72704E+01; +4,88716E+04;
13; +2,49202E+01; +1,00035E+05;

Reading out my "x" or "y" data with the IPython console, the output is like this:

   [' +1,94251E+02'],
   [' +5,23981E+02'],
   [' +0,00000E+00'],
   [' +1,10525E+02'],
   [' +0,00000E+00'],
   [' +4,76363E+01'],
   [' +1,61714E+01'],
   [' +1,65482E+02'],
   [' +0,00000E+00'],
   [' +4,75312E+02'],
   [' +4,20174E+01']], dtype=object)

SOLUTION:

As you pointed out, the comma was the problem. I simply added the decimal setting in the code:

df = pd.read_csv("test.csv", sep=";", decimal=",")

Now the graph looks like, how it is supposed to look.

Thank you!

It's clear that the csv data wasn't read correctly or more specifically as you expected. Based on your examples, all of your data was read as strings including the numbers. The reason is that the format of the numbers in your file will not be interpreted correctly depending on your locale. I modified the small snippet of data you provided so that the period and not the comma represents the decimal point which is customary in my locale. As you can see, the data is properly read into the dataframe.

df = pd.read_csv("d:\\users\\floyd\\documents\\sample.csv", sep=';'); df
Out[72]: 
   run#       Diameter        dN/dlnD
0    12        35.8151        1173.36
1    13        32.6913        6060.44
2    13        29.8524       17651.60
3    13        27.2704       48871.60
4    13        24.9202      100035.00

I also removed the annoying leading spaces in the column names with this.

df.columns = [col.strip() for col in df.columns]; df.columns

Now it plots properly.

plt.plot(df['Diameter'], df['dN/dlnD'])
Out[75]: [<matplotlib.lines.Line2D at 0x25ef97bd0b8>]

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM