简体   繁体   中英

how to read csv file with commas within double quote using pandas (python)

I have a dataset looks like this on wordpad.

"state","industry","2000","2005"

"A","art,music",2934,2454

"B","farm",3949,2343

And I want to read this on python like this.

"state" "industry" "2000" "2005"
"A" "art,music" 2934 2454
"B" "farm" 3949 2343

I tried the codes below.

df = pd.read_csv(os.path.join(path, filename), engine='python', sep=',' , quoting=3)

this casts an error "ParserError: Expected 6 fields in line 8, saw 8"

df = pd.read_csv(os.path.join(path, filename), engine='python', sep='",' , quoting=3)

this puts all the numbers in a same cell.

I read a lot of posts asking similar question, but mine is a bit different from then because 1) I have a data which contains commas within double quotes and 2) employment numbers are not quoted.

How can I handle it? Help appreciated!

The default parameters to read_csv should work

import pandas as pd
import io

# for test
csv = io.StringIO('''\
"state","industry","2000","2005"
"A","art,music",2934,2454
"B","farm",3949,2343''')

df = pd.read_csv(csv)
print(df)
print(df.dtypes)

output

  state   industry  2000  2005
0     A  art,music  2934  2454
1     B       farm  3949  2343
state       object
industry    object
2000         int64
2005         int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM