[英]Pandas: read_csv, Error tokenizing data on seemingly regular data
I'm trying to read the elnino dataset from: https://archive.ics.uci.edu/ml/machine-learning-databases/el_nino-mld/el_nino.data.html 我正在尝试从以下网址读取Elnino数据集: https : //archive.ics.uci.edu/ml/machine-learning-databases/el_nino-mld/el_nino.data.html
However, I'm getting 'Error tokeninzing data'. 但是,我收到“错误标记数据”。 The data itself looks like this when opened with WordPad: 使用写字板打开时,数据本身如下所示:
1 1 8.96 -140.32 -6.3 -6.4 83.5 27.32 27.57
1 2 8.95 -140.32 -5.7 -3.6 86.4 26.70 27.62
1 3 8.96 -140.32 -6.2 -5.8 83.0 27.36 27.68
1 4 8.96 -140.34 -6.4 -5.3 82.2 27.32 27.70
1 5 8.96 -140.33 -4.9 -6.2 87.3 27.09 27.85
1 6 8.96 -140.33 -6.3 -4.9 91.5 26.82 27.98
1 7 8.97 -140.32 -6.7 -3.7 94.1 26.62 28.04
1 8 8.96 -140.33 -6.3 -4.8 92.0 26.89 27.98
1 9 8.97 -140.33 -6.3 -4.9 86.9 27.44 28.13
1 10 8.97 -140.32 -4.2 -2.5 87.3 26.62 28.14
1 11 8.96 -140.32 -6.8 -2.4 86.0 27.60 28.09
1 12 8.96 -140.33 -7.1 -3.2 82.2 27.87 28.15
1 13 8.96 -140.33 -6.7 -4.7 81.3 27.75 28.19
which looks unproblematic to me. 在我看来,这没有问题。 So far I've tried: 到目前为止,我已经尝试过:
pd.read_csv('elnino', sep=' | | |\t', header=None) # ValueError: Expected 13 fields in line 11, saw 35
pd.read_csv('elnino', sep=' ', error_bad_lines=False, header=None) # undesirable, because I'm losing more than half the lines, which are fine and the resulting dataframe still has a lot of nans
What is the problem with the input data? 输入数据有什么问题?
Upon just reading the first few lines, I noticed a couple of nans, caused by sep=' | | |\\t'
在阅读了前几行后,我注意到由sep=' | | |\\t'
sep=' | | |\\t'
sep=' | | |\\t'
. sep=' | | |\\t'
。 such that three spaces were interpreted as '[sep]nan[sep]'. 这样三个空格被解释为“ [sep] nan [sep]”。
The solution is: 解决方案是:
df = pd.read_csv('elnino', sep=' *', header=None)
Edit : Just noticed that this is probably an even more appropriate solution: df = pd.read_csv('elnino', delim_whitespace=True, header=None) 编辑 :刚注意到,这可能是一个更合适的解决方案:df = pd.read_csv('elnino',delim_whitespace = True,header = None)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.