熊猫：read_csv，在看似常规数据上标记数据时出错

Question

I'm trying to read the elnino dataset from: https://archive.ics.uci.edu/ml/machine-learning-databases/el_nino-mld/el_nino.data.html 我正在尝试从以下网址读取Elnino数据集： https : //archive.ics.uci.edu/ml/machine-learning-databases/el_nino-mld/el_nino.data.html

However, I'm getting 'Error tokeninzing data'. 但是，我收到“错误标记数据”。 The data itself looks like this when opened with WordPad: 使用写字板打开时，数据本身如下所示：

1 1   8.96 -140.32 -6.3  -6.4  83.5 27.32 27.57
1 2   8.95 -140.32 -5.7  -3.6  86.4 26.70 27.62
1 3   8.96 -140.32 -6.2  -5.8  83.0 27.36 27.68
1 4   8.96 -140.34 -6.4  -5.3  82.2 27.32 27.70
1 5   8.96 -140.33 -4.9  -6.2  87.3 27.09 27.85
1 6   8.96 -140.33 -6.3  -4.9  91.5 26.82 27.98
1 7   8.97 -140.32 -6.7  -3.7  94.1 26.62 28.04
1 8   8.96 -140.33 -6.3  -4.8  92.0 26.89 27.98 
1 9   8.97 -140.33 -6.3  -4.9  86.9 27.44 28.13
1 10  8.97 -140.32 -4.2  -2.5  87.3 26.62 28.14
1 11  8.96 -140.32 -6.8  -2.4  86.0 27.60 28.09
1 12  8.96 -140.33 -7.1  -3.2  82.2 27.87 28.15
1 13  8.96 -140.33 -6.7  -4.7  81.3 27.75 28.19

which looks unproblematic to me. 在我看来，这没有问题。 So far I've tried: 到目前为止，我已经尝试过：

pd.read_csv('elnino', sep=' |  |   |\t', header=None) # ValueError: Expected 13 fields in line 11, saw 35
pd.read_csv('elnino', sep=' ', error_bad_lines=False, header=None) # undesirable, because I'm losing more than half the lines, which are fine and the resulting dataframe still has a lot of nans

What is the problem with the input data? 输入数据有什么问题？

Answer 1

Upon just reading the first few lines, I noticed a couple of nans, caused by sep=' | | |\\t' 在阅读了前几行后，我注意到由sep=' | | |\\t' sep=' | | |\\t' sep=' | | |\\t' . sep=' | | |\\t' 。 such that three spaces were interpreted as '[sep]nan[sep]'. 这样三个空格被解释为“ [sep] nan [sep]”。

The solution is: 解决方案是：

df = pd.read_csv('elnino', sep=' *', header=None)

Edit : Just noticed that this is probably an even more appropriate solution: df = pd.read_csv('elnino', delim_whitespace=True, header=None) 编辑：刚注意到，这可能是一个更合适的解决方案：df = pd.read_csv（'elnino'，delim_whitespace = True，header = None）

熊猫：read_csv，在看似常规数据上标记数据时出错

问题描述

1 个解决方案

解决方案1
0 2017-11-08 22:07:46

熊猫：read_csv，在看似常规数据上标记数据时出错

问题描述

1 个解决方案

解决方案1 0 2017-11-08 22:07:46

解决方案1
0 2017-11-08 22:07:46