简体   繁体   English

Python 用 Panda 解析 csv 中的字符串值

[英]Python parsing string value in csv with Panda

I am new to Python and I am trying to read a csv file using pandas but I have a bit of a problem within my csv file. I am new to Python and I am trying to read a csv file using pandas but I have a bit of a problem within my csv file. I have strings which contains commas at the end and this creates an undesired column at towards the end as shown:我的字符串末尾包含逗号,这会在末尾创建一个不需要的列,如图所示:

csv文件表

This is the raw csv:这是原始的 csv:

原始csv

For example, on line 14 , the green string value ends with a comma and creates a new column which then gives me parsing errors when using this:例如,在第14行,绿色字符串值以逗号结尾并创建一个新列,然后在使用此列时会出现解析错误:

import pandas as pd

pd.read_csv("data.csv")

ParserError: Error tokenizing data. ParserError:错误标记数据。 C error: Expected 6 fields in line 8, saw 7 C 错误:预计第 8 行中有 6 个字段,看到 7

Is there a way I can clean up this and merge the last two columns?有没有办法可以清理它并合并最后两列?

You can use np.where to replace APP with the last column where APP is missing, then drop the last column.您可以使用np.where将 APP 替换为缺少 APP 的最后一列,然后删除最后一列。

import pandas as pd
import numpy as np
df = pd.read_csv("data.csv")
df['APP'] = np.where(df.app.isna(), df[-1], df.APP)
df = df.iloc[:,:-1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM