简体   繁体   English

如何解决 pandas 错误标记数据?

[英]How can I solve pandas error tokenizing data?

I have a .tsv file displayed below and I want to make dataframe of it using pandas.我在下面显示了一个 .tsv 文件,我想使用 pandas 制作它的数据框。 I am getting error tokenizing data.我收到错误标记数据。 How can I overcome this error ?我怎样才能克服这个错误?

Tsv file (data): Tsv 文件(数据):

在此处输入图像描述

Input:输入:

import pandas as pd
df = pd.read_csv("something/something.tsv", sep='\t')

Output:输出:

ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 4

read_csv is by default taking your first line as a header. read_csv默认情况下将您的第一行作为标题。 append and 6.0 become your two headers. append6.0成为你的两个标题。 Then it looks for two columns in subsequent rows.然后它在后续行中查找两列。 In line 3 it finds 4 values and vomits.在第 3 行,它找到 4 个值并呕吐。

You need another approach to handle this data where each line is a key-value pair with multiple values present.您需要另一种方法来处理这些数据,其中每一行都是一个键值对,存在多个值。

Per your comment - just read it all anyway根据您的评论 - 无论如何都要阅读

Here's how you can do that:您可以这样做:

import pandas as pd
import numpy as np

df = pd.read_csv("something/something.tsv", sep='\t', header=None, names=np.arange(20))

names=np.arange(20) is the key - and can be whatever number is more than the number of values you will have in a row. names=np.arange(20)是关键 - 并且可以是任何数量超过您将连续拥有的值的数量。 Then you can do whatever you need to do to get the data the way you want it.然后你可以做任何你需要做的事情,以你想要的方式获取数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM