Python 用 Panda 解析 csv 中的字符串值

Question

I am new to Python and I am trying to read a csv file using pandas but I have a bit of a problem within my csv file. I am new to Python and I am trying to read a csv file using pandas but I have a bit of a problem within my csv file. I have strings which contains commas at the end and this creates an undesired column at towards the end as shown:我的字符串末尾包含逗号，这会在末尾创建一个不需要的列，如图所示：

This is the raw csv:这是原始的 csv：

For example, on line 14 , the green string value ends with a comma and creates a new column which then gives me parsing errors when using this:例如，在第14行，绿色字符串值以逗号结尾并创建一个新列，然后在使用此列时会出现解析错误：

import pandas as pd

pd.read_csv("data.csv")

ParserError: Error tokenizing data. ParserError：错误标记数据。 C error: Expected 6 fields in line 8, saw 7 C 错误：预计第 8 行中有 6 个字段，看到 7

Is there a way I can clean up this and merge the last two columns?有没有办法可以清理它并合并最后两列？

Answer 1

You can use np.where to replace APP with the last column where APP is missing, then drop the last column.您可以使用np.where将 APP 替换为缺少 APP 的最后一列，然后删除最后一列。

import pandas as pd
import numpy as np
df = pd.read_csv("data.csv")
df['APP'] = np.where(df.app.isna(), df[-1], df.APP)
df = df.iloc[:,:-1]

Python 用 Panda 解析 csv 中的字符串值

问题描述

1 个解决方案

解决方案1
0 2021-02-24 12:01:30

Python 用 Panda 解析 csv 中的字符串值

问题描述

1 个解决方案

解决方案1 0 2021-02-24 12:01:30

解决方案1
0 2021-02-24 12:01:30