简体   繁体   中英

Read csv with commas surrounded by double quotes

I have a CSV file (Comma seperated) in an S3 bucket. Few fields have commas, and the CSV file looks like this:

Q,W,E,R
A,S,"D,F",G
Z,X,C,V

When I read this in pandas , I should get 4 columns with "D,F" in one column, but I am getting an extra column.

My Code; different things I tried, but all tries did not work:

import io
import csv
import pandas as pd

#encoding
result = chardet.detect(self.raw_content)
self.encoding = result['encoding']

#csv_delimiter 
is being read from the DB ( , in this case)

#max_columns 
is NUMBER of columns in the csv file

#reading from s3 bucket
self.raw_content = obj['Body'].read()
content = io.BytesIO(self.raw_content)

#Try 1
df_s3_file = pd.read_csv(content, delimiter=csv_delimiter, engine='python',
    dtype=object, encoding=self.encoding, quotechar='"',
    names=list(range(0,max_columns)))

#Try 2
df_s3_file = pd.read_csv(content, delimiter=csv_delimiter, engine='python',
    dtype=object, encoding=self.encoding, quoting=csv.QUOTE_ALL,
    names=list(range(0,max_columns)))

#Try 3
df_s3_file = pd.read_csv(content, delimiter=csv_delimiter, dtype=object,
    encoding=self.encoding, quoting=csv.QUOTE_ALL,
    names=list(range(0,max_columns)))           

Current Result:

0    1    2    4    5
Q    W    E    R    NaN
A    S    "D   F"   G
Z    X    C    V    NaN  

Expected Result:

0    1    2    4
Q    W    E    R
A    S    D,F  G
Z    X    C    V

You can process it with the following code (after https://stackoverflow.com/a/64456792/5660315 ):

from io import StringIO
import csv
import pandas as pd

s="""
Q,W,E,R
A,S,"D,F",G
Z,X,C,V
"""
df = pd.read_csv(StringIO(s),
                 names=range(4),
                 sep=',',
                 quoting=csv.QUOTE_ALL,
                 quotechar='"'
                )
print(df)
#    0  1    2  3
# 0  Q  W    E  R
# 1  A  S  D,F  G
# 2  Z  X    C  V

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM