I have a CSV file (Comma seperated) in an S3 bucket. Few fields have commas, and the CSV file looks like this:
Q,W,E,R
A,S,"D,F",G
Z,X,C,V
When I read this in pandas
, I should get 4 columns with "D,F"
in one column, but I am getting an extra column.
My Code; different things I tried, but all tries did not work:
import io
import csv
import pandas as pd
#encoding
result = chardet.detect(self.raw_content)
self.encoding = result['encoding']
#csv_delimiter
is being read from the DB ( , in this case)
#max_columns
is NUMBER of columns in the csv file
#reading from s3 bucket
self.raw_content = obj['Body'].read()
content = io.BytesIO(self.raw_content)
#Try 1
df_s3_file = pd.read_csv(content, delimiter=csv_delimiter, engine='python',
dtype=object, encoding=self.encoding, quotechar='"',
names=list(range(0,max_columns)))
#Try 2
df_s3_file = pd.read_csv(content, delimiter=csv_delimiter, engine='python',
dtype=object, encoding=self.encoding, quoting=csv.QUOTE_ALL,
names=list(range(0,max_columns)))
#Try 3
df_s3_file = pd.read_csv(content, delimiter=csv_delimiter, dtype=object,
encoding=self.encoding, quoting=csv.QUOTE_ALL,
names=list(range(0,max_columns)))
Current Result:
0 1 2 4 5
Q W E R NaN
A S "D F" G
Z X C V NaN
Expected Result:
0 1 2 4
Q W E R
A S D,F G
Z X C V
You can process it with the following code (after https://stackoverflow.com/a/64456792/5660315 ):
from io import StringIO
import csv
import pandas as pd
s="""
Q,W,E,R
A,S,"D,F",G
Z,X,C,V
"""
df = pd.read_csv(StringIO(s),
names=range(4),
sep=',',
quoting=csv.QUOTE_ALL,
quotechar='"'
)
print(df)
# 0 1 2 3
# 0 Q W E R
# 1 A S D,F G
# 2 Z X C V
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.