简体   繁体   中英

Parsing CSV files, and writing them into another CSV format

The format of my import CSV fetched using urllib2 and put into folders are like so:

number,season,episode,production code,airdate,title,special?,tvrage
1,1,1,"101",24/Sep/07,"Pilot",n,"http://www.tvrage.com/Chuck/episodes/579282"

Now I am successfully converting that into SQL statments as well as another CSV file that can be inserted into my database. Into a format like so:

,1,1,1,"Pilot",'2006-10-11',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1

Using the following code

csv = """,%s,%s,%s,%s,%r,,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1""" % (showid, line[1],line[2], line[5], date(line[4]))
    print>>final, csv

EDIT -

I have changed from string formatting to this:

csv = ','+showid+','+line[1]+','+line[2]+','+line[5]+','+date(line[4])+',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1'

Its not much better, and I am still having trouble with some files being skipped on the parse. Not sure if its me or the CSV module.

Problem is its going through some files perfectly fine. Some CSV files it just skips, and for some I just get errors like IndexError: list index out of range

If anyone has experience with CSV files and getting them to parse correctly I would really appreciate the help.

Here is the Full Source Code: http://cl.ly/2W472g303D1p0J3S2o46

dsimport.py - http://pastie.org/3076663 CSVFileHandler.py - http://pastie.org/3076667

Thanks

I'm not sure exactly what are all the errors, but here are a few tips:

  1. processFile(line), line is a bit of a bad name as it isn't a string line, it's a row or list of elements. That's what confused Tim and me as well at first sight.
  2. You should verify that line has at least 6 elements as your script requires.
  3. You can use the join method which is awesome.

Here's a small refactoring:

def processFile(row):
    if len(row) < 6:
        #raise Exception('too few columns')
        # maybe it's better to just ignore bad rows in your case
        return
    items = [
        '',
        showid,
        row[1],
        row[2],
        row[5],
        date(row[4]),
        ]
    res = ','.join(items)
    res += ',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1'
    print res
    print>>final, res

handler = CSVFileHandler('/Users/tharshan/WebRoot/stv/export/csv/%s-save.csv' % name)
try:
    handler.process(processFile, name)    
except Exception, e:
    print 'Failed processing and skipping %s because of: %s' % (name, e)

final.close()

Nevermind all fixed. In the end I just used the excel dialect, and did the output csv with pipe lines. Either way it was quite fiddly and honestly feel like i got it to work with sheer luck.

Thanks for all the help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM