[英]How can I solve pandas error tokenizing data?
I have a .tsv file displayed below and I want to make dataframe of it using pandas.我在下面显示了一个 .tsv 文件,我想使用 pandas 制作它的数据框。 I am getting error tokenizing data.我收到错误标记数据。 How can I overcome this error ?我怎样才能克服这个错误?
Tsv file (data): Tsv 文件(数据):
Input:输入:
import pandas as pd
df = pd.read_csv("something/something.tsv", sep='\t')
Output:输出:
ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 4
read_csv
is by default taking your first line as a header. read_csv
默认情况下将您的第一行作为标题。 append
and 6.0
become your two headers. append
和6.0
成为你的两个标题。 Then it looks for two columns in subsequent rows.然后它在后续行中查找两列。 In line 3 it finds 4 values and vomits.在第 3 行,它找到 4 个值并呕吐。
You need another approach to handle this data where each line is a key-value pair with multiple values present.您需要另一种方法来处理这些数据,其中每一行都是一个键值对,存在多个值。
Per your comment - just read it all anyway根据您的评论 - 无论如何都要阅读
Here's how you can do that:您可以这样做:
import pandas as pd
import numpy as np
df = pd.read_csv("something/something.tsv", sep='\t', header=None, names=np.arange(20))
names=np.arange(20)
is the key - and can be whatever number is more than the number of values you will have in a row. names=np.arange(20)
是关键 - 并且可以是任何数量超过您将连续拥有的值的数量。 Then you can do whatever you need to do to get the data the way you want it.然后你可以做任何你需要做的事情,以你想要的方式获取数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.