简体   繁体   English

pandas.read_csv 的“千”和“skip_blank_lines”arguments 将无法正常工作。 为什么?

[英]"Thousands" and " skip_blank_lines" arguments of pandas.read_csv would not work properly. Why?

This my code:这是我的代码:

in[0]
      import pandas as pd
      df = pd.read_csv('datefile6.csv',thousands=',', skip_blank_lines=True)
      df

out[1]      month   day     year    salary   age
       0    8.0     15.0    2012.0  1400.0   25.0
       1    NaN     NaN     NaN     NaN      NaN
       2    9.0     4.0     2020.0  2500.0   26.0

As we see the thousands did not work.正如我们所见,成千上万的人没有工作。 Also, line[1] which is blank has not been skipped by the commands.此外,命令未跳过空白的行 [1]。

I expected ',' from the "thousands" command and removed line[1] from the "skip_blan_lines" command.我期望“,”来自“thousands”命令,并从“skip_blan_lines”命令中删除了行[1]。

thousands parameter is a property of the input file. thousands参数是输入文件的一个属性。 It tells pandas that numbers in your csv file contain thousands character (typically comma or dot).它告诉 pandas csv 文件中的数字包含千位字符(通常是逗号或点)。 Parameter thousands does not impact the output.参数 thousands 不影响 output。

Consider this code:考虑这段代码:

import pandas as pd
df = pd.read_csv('datafile6.csv', sep=';', thousands=',', skip_blank_lines=True)
print(df)

where datafile6.csv is:其中 datafile6.csv 是:

month;day;year;salary;age
8;15;2,012;1,400;25
9;4;2,020;2,500;26

I get the output as:我得到 output 作为:

   month   day    year  salary   age
0    8.0  15.0  2012.0  1400.0  25.0
1    9.0   4.0  2020.0  2500.0  26.0

and you can see that 1,400 has been correctly parsed as 1400 etc.您可以看到 1,400 已被正确解析为 1400 等。

Regarding your question about skip_blank_lines.关于您关于 skip_blank_lines 的问题。 I suspect that instead of a completely blank line, your csv contains field separators.我怀疑您的 csv 不是完全空白的行,而是包含字段分隔符。
Consider now this as the content from datafile6.csv:现在将其视为数据文件 6.csv 中的内容:

month;day;year;salary;age
8;15;2,012;1,400;25
;;;;
9;4;2,020;2,500;26

9;3;2,021;3,200;33

I get dataframe output as:我得到 dataframe output 作为:

   month   day    year  salary   age
0    8.0  15.0  2012.0  1400.0  25.0
1    NaN   NaN     NaN     NaN   NaN
2    9.0   4.0  2020.0  2500.0  26.0
3    9.0   3.0  2021.0  3200.0  33.0

NaN results from line #3 of datafile6.csv which is not really blank but has 4 field separators. NaN 来自 datafile6.csv 的第 3 行,它实际上不是空白,但有 4 个字段分隔符。 Where as line #5 which is completely blank is skipped.跳过完全空白的第 5 行。 This is the behaviour of parameter skip_blank_lines这是参数skip_blank_lines的行为

Hope this clears.希望这清除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM