简体   繁体   中英

What's wrong with this python program working on .csv?

I have a text file with a list of strings.

I want to search a .csv file for rows that begin with those strings and put them in a new .csv file.

In this instance, the text file is called 'output.txt', the original .csv is 'input.csv' and the new .csv file is 'corrected.csv'.

The code:

import csv

file = open('output.txt')
while 1:
    line = file.readline()
    writer = csv.writer(open('corrected.csv','wb'), dialect = 'excel')
    for row in csv.reader('input.csv'):
        if not row[0].startswith(line):
            writer.writerow(row)
    writer.close()
    if not line:
        break
    pass

The error:

Traceback (most recent call last):
File "C:\Python32\Sample Program\csvParser.py", line 9, in <module>
writer.writerow(row)
TypeError: 'str' does not support the buffer interface`

New error:

Traceback (most recent call last):
File "C:\Python32\Sample Program\csvParser.py", line 12, in <module>
for row in reader:
_csv.Error: line contains NULL byte

Problem was that the CSV file was saved with tabs instead of commas, new issue now is the following:

Traceback (most recent call last):
  File "C:\Python32\Sample Program\csvParser.py", line 13, in <module>
    if row[0] not in lines:
IndexError: list index out of range

The CSV file has 500+ entries of data... does this make a difference?

If you look at the documentation , this is how the reader is initialized:

spamReader = csv.reader(open('eggs.csv', 'r'), ...

Notice the open('eggs.csv, 'rb') . You aren't passing a file handle in line 9 , so the str is being treated as a file handle and is throwing you the error.

Replace line 9 with this:

csv.reader(open('input.csv', 'r', newline = ''))

The csv.reader can't open a file, it takes a file object. A better solution would be this:

import csv

lines = []
with open('output.txt', 'r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'r') as mycsv:
        reader = csv.reader(mycsv)
        for row in reader:
            if row[0] not in lines:
                writer.writerow(row)

Your latest problem:

    if row[0] not in lines:
IndexError: list index out of range

The error message mentions a list index.
There is only one list index that it could be talking about: 0
If 0 is out of range, then len(row) must be zero.
If len(row) is zero, then the corresponding line in the input file must be empty.
If a line in the input file is empty, what do you want to do:

(a) ignore the input line altogether?
(b) raise a (fatal) error?
(c) log an error message somewher and keep going?
(d) something else?

Try this

import csv
import cStringIO

file = open('output.txt') 
while True:     
    line = file.readline()
    buf = cStringIO.StringIO()    
    writer = csv.writer(buf, dialect = 'excel')     
    for row in csv.reader(open('input.csv')):         
        if not row[0].startswith(line):             
            writer.writerow(row)     
    writer.close()
    output = open('corrected.csv', 'wb')
    output.write(buf.getvalue())    
    if not line:         
        break            
    pass

In my experience, using a cStringIO buffer for the whole process and then dumping the entire buffer into a file is faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM