如何解决 pandas 错误标记数据？

Question

I have a .tsv file displayed below and I want to make dataframe of it using pandas.我在下面显示了一个 .tsv 文件，我想使用 pandas 制作它的数据框。 I am getting error tokenizing data.我收到错误标记数据。 How can I overcome this error ?我怎样才能克服这个错误？

Tsv file (data): Tsv 文件（数据）：

Input:输入：

import pandas as pd
df = pd.read_csv("something/something.tsv", sep='\t')

Output:输出：

ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 4

Answer 1

read_csv is by default taking your first line as a header. read_csv默认情况下将您的第一行作为标题。 append and 6.0 become your two headers. append和6.0成为你的两个标题。 Then it looks for two columns in subsequent rows.然后它在后续行中查找两列。 In line 3 it finds 4 values and vomits.在第 3 行，它找到 4 个值并呕吐。

You need another approach to handle this data where each line is a key-value pair with multiple values present.您需要另一种方法来处理这些数据，其中每一行都是一个键值对，存在多个值。

Per your comment - just read it all anyway根据您的评论 - 无论如何都要阅读

Here's how you can do that:您可以这样做：

import pandas as pd
import numpy as np

df = pd.read_csv("something/something.tsv", sep='\t', header=None, names=np.arange(20))

names=np.arange(20) is the key - and can be whatever number is more than the number of values you will have in a row. names=np.arange(20)是关键 - 并且可以是任何数量超过您将连续拥有的值的数量。 Then you can do whatever you need to do to get the data the way you want it.然后你可以做任何你需要做的事情，以你想要的方式获取数据。

如何解决 pandas 错误标记数据？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-12 20:22:08

如何解决 pandas 错误标记数据？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-12 20:22:08

解决方案1
1 已采纳 2022-07-12 20:22:08