简体   繁体   English

科学符号Matplotlib /熊猫

[英]Scientific Notation Matplotlib / Pandas

I have a CSV-file with about 28 columns and 4000 rows. 我有一个包含28列和4000行的CSV文件。 From two of these columns i want to plot about 50 specific rows. 从这些列中的两个中,我想绘制约50个特定行。 I used pandas to select this part of the file, but i cannot figure out, how it reads the scientific numbers in a right way. 我用熊猫来选择文件的这一部分,但我不知道它是如何以正确的方式读取科学数字的。

My code: 我的代码:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("20180416309.csv", sep=";")

x = df.loc[df[u'run#'] == 3, [u'     Diameter']].values
y = df.loc[df[u'run#'] == 3, [u'      dN/dlnD']].values

plt.plot(x, y)
plt.show

So, i am trying to plot the columns u' Diameter' and u' dN/dlnD' when in column u'run#' displays the number 3. Typing "x" or "y" in the IPython console, the right numbers are given. 因此,我试图在u'run#'列中显示数字3时绘制u'Diameter'和u'dN / dlnD'列。在IPython控制台中键入“ x”或“ y”,正确的数字是给定的。

Unfortunately, the plot looks like this: 不幸的是,情节看起来像这样:

As you can see, the decimal power of the scientific notation of these numbers on the y-axis is ignored. 如您所见,这些数字在y轴上的科学计数形式的十进制幂被忽略。 How can i fix this? 我怎样才能解决这个问题? This is my first try using matplotlib and pandas, so please excuse my beginner question. 这是我第一次尝试使用matplotlib和pandas,所以请原谅我的初学者问题。

Edit: 编辑:

The file´s data looks like this: 该文件的数据如下所示:

run#;     Diameter;      dN/dlnD;
12; +3,58151E+01; +1,17336E+03;
13; +3,26913E+01; +6,06044E+03;
13; +2,98524E+01; +1,76516E+04;
13; +2,72704E+01; +4,88716E+04;
13; +2,49202E+01; +1,00035E+05;

Reading out my "x" or "y" data with the IPython console, the output is like this: 使用IPython控制台读取我的“ x”或“ y”数据,输出如下所示:

   [' +1,94251E+02'],
   [' +5,23981E+02'],
   [' +0,00000E+00'],
   [' +1,10525E+02'],
   [' +0,00000E+00'],
   [' +4,76363E+01'],
   [' +1,61714E+01'],
   [' +1,65482E+02'],
   [' +0,00000E+00'],
   [' +4,75312E+02'],
   [' +4,20174E+01']], dtype=object)

SOLUTION: 解:

As you pointed out, the comma was the problem. 正如您所指出的,逗号是问题所在。 I simply added the decimal setting in the code: 我只是在代码中添加了十进制设置:

df = pd.read_csv("test.csv", sep=";", decimal=",")

Now the graph looks like, how it is supposed to look. 现在该图看起来像是应该看起来的样子。

Thank you! 谢谢!

It's clear that the csv data wasn't read correctly or more specifically as you expected. 很明显,csv数据未正确读取,或者更具体地没有按您的预期读取。 Based on your examples, all of your data was read as strings including the numbers. 根据您的示例,所有数据都被读取为包含数字的字符串。 The reason is that the format of the numbers in your file will not be interpreted correctly depending on your locale. 原因是文件的数字格式将无法正确解释,具体取决于您的区域设置。 I modified the small snippet of data you provided so that the period and not the comma represents the decimal point which is customary in my locale. 我修改了您提供的一小段数据,以使句点(而不是逗号)代表小数点,这是我的区域设置中的习惯。 As you can see, the data is properly read into the dataframe. 如您所见,数据已正确读取到数据框中。

df = pd.read_csv("d:\\users\\floyd\\documents\\sample.csv", sep=';'); df
Out[72]: 
   run#       Diameter        dN/dlnD
0    12        35.8151        1173.36
1    13        32.6913        6060.44
2    13        29.8524       17651.60
3    13        27.2704       48871.60
4    13        24.9202      100035.00

I also removed the annoying leading spaces in the column names with this. 我还以此删除了列名中令人讨厌的前导空格。

df.columns = [col.strip() for col in df.columns]; df.columns

Now it plots properly. 现在可以正确绘制了。

plt.plot(df['Diameter'], df['dN/dlnD'])
Out[75]: [<matplotlib.lines.Line2D at 0x25ef97bd0b8>]

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM