简体   繁体   中英

CSV import with Python; incorrect “,” delimiter behavior

I am using the csv module in the following manner

header = '"Id","IsDeleted","MasterRecordId","Salutation","FirstName","LastName","Name","Type","RecordTypeId","ParentId","BillingStreet","BillingCity","BillingState","BillingPostalCode","BillingCountry","BillingLatitude"'
header_c = csv.reader(header, delimiter=',', quotechar='"')

names = []
for row in header_c:
  names.append(row)

Inspecting names returns:

[['Id'], ['', ''], ['IsDeleted'], ['', ''], ['MasterRecordId'], ['', ''], ['Salutation'], ['', ''], ['FirstName'], ['', ''], ['LastName'], ['', ''], ['Name'], ['', ''], ['Type'], ['', ''], ['RecordTypeId'], ['', ''], ['ParentId'], ['', ''], ['BillingStreet'], ['', ''], ['BillingCity'], ['', ''], ['BillingState'], ['', ''], ['BillingPostalCode'], ['', ''], ['BillingCountry'], ['', ''], ['BillingLatitude']]

I could ignore all the odd entries, keeping 0, 2, 4, ...., but I don't understand what I am doing wrong and why the commas are being kept as entries. What do I have to change in order for the comma's to be dropped. 'IsDeleted' should be the second entry (names[1])

Thanks in advance.

csv.reader() can handle any iterable , and expects each iteration over that iterable to yield a complete line . The iterable can be a file-like object, or (normally) a list of strings:

header_c = csv.reader([header], delimiter=',', quotechar='"')

If you pass in just a single string object, the string itself is iterated over as if each character was a line, but because of the quotes csv will continue to read 'lines' until it finds a closing quote character.

The next 'line' contains just a comma, so that is seen as a line of two empty values.

Or, to take the first 5 characters ( "Id", ) as an example, csv does this:

  • Iterate and receive " . This is a quoted value, so include everything up to the end of the line.
  • There is an opening quote, iterate until a closing quote is found, everything until that point is appended to the existing value.
    • loop and receive I , append.
    • loop and receive d , append.
    • loop and receive " . Quote closed, yield a complete row ['Id'] .
  • Iterate and receive , . This is a complete line with a delimiter, so yield ['', ''] .

Whenever I need to pass in a string value to csv.reader() I use str.splitlines() ; this method will always return a list, so this works for lines without newlines too:

header_c = csv.reader(header.splitlines(True), delimiter=',', quotechar='"')

I leave in the newlines (pass in True to str.splitlines() ; quoted values with newlines are then properly returned with the newlines included.

You should pass a file-like object (or any other iterable) to csv.reader as a first parameter.

csv.reader(csvfile, dialect='excel', **fmtparams)

Return a reader object which will iterate over lines in the given csvfile. csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable.

One option is to read the string into the StringIO buffer:

from StringIO import StringIO
header_c = csv.reader(StringIO(header), delimiter=',', quotechar='"')

Then, in names, you'll get:

[['Id', 'IsDeleted', 'MasterRecordId', 'Salutation', 'FirstName', 'LastName', 'Name', 'Type', 'RecordTypeId', 'ParentId', 'BillingStreet', 'BillingCity', 'BillingState', 'BillingPostalCode', 'BillingCountry', 'BillingLatitude']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM