简体   繁体   English

读取 CSV 文件,该文件使用双引号分隔列中的值,并使用逗号分隔 Python 中的列

[英]Read CSV file that uses doubles quotes to separate values within columns and comma's to separate columns in Python

There's a file that uses comma's to separate columns as well as indicate empty column values.有一个文件使用逗号分隔列以及指示空列值。 Moreover, in the same file double quotes are used to separate field values, where commas are being used to separate the values within the field.此外,在同一个文件中,双引号用于分隔字段值,其中逗号用于分隔字段内的值。 If column has only single value, double quotes are not being used.如果列只有单个值,则不使用双引号。

Example:例子:

col1,col2,col3,col4,col5,col6
name, age,,,"cat,dog", year
name,age,weight,height,cat,year
"first name,last name",age, weight,,"dog, cat, another dog",,

Expected result预期结果

col1 col1 col2 col2 col3 col3 col4 col4 col5 col5 col6 col6
name姓名 age年龄 dog, cat狗猫 year
name姓名 age年龄 weight重量 height高度 cat year
first name, last name名字,姓氏 age年龄 weight重量 dog, cat, another dog狗,猫,另一只狗

Another important thing, if that matters, is that the CSV uses Windows-1252 encoding .另一个重要的事情,如果这很重要,那就是 CSV 使用Windows-1252 编码

Your CSV is not in the right format.您的 CSV 格式不正确。 You have an extra comma at the end of last row.最后一行的末尾有一个额外的逗号。 If you remove that extra comma this code will work:如果您删除多余的逗号,则此代码将起作用:

import pandas as pd

df = pd.read_csv('data/test_data.txt', quotechar='"', delimiter=',')
print(df)

The output is this: output 是这样的:

                   col1  col2     col3    col4                   col5   col6
0                  name   age      NaN     NaN                cat,dog   year
1                  name   age   weight  height                    cat   year
2  first name,last name   age   weight     NaN  dog, cat, another dog    NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM