简体   繁体   中英

converting an uploaded csv to python list

I have a two-column csv which I have uploaded via an HTML page to be operated on by a python cgi script. Looking at the file on the server side, it looks to be a long string ie for a file called test.csv with the contents.

col1,  col2  
x,y  

has become

('upfile', 'test.csv', 'col1,col2'\t\r\nx,y') 

Col1 contains the data I want to operate on (ie x) and col 2 contains its identifier (y). Is there a better way of doing the uploading or do I need to manually extract the fields I want - this seems potentially very error-prone thanks

If you're using the cgi module in python, you should be able to do something like:

form = cgi.FieldStorage()
thefile = form['upfile']

reader = csv.reader(thefile.file)
header = reader.next() # list of column names
for row in reader:
    # row is a list of fields
    process_row(row)

See, for example, cgi programming or the python cgi module docs.

Can't you use the csv module to parse this? It certantly better than rolling your own.

Something along the lines of

import csv
import cgi

form = cgi.FieldStorage()
thefile = form['upfile']

reader = csv.reader(thefile, delimiter=',')
for row in reader:
  for field in row:
    doThing()

EDIT : Correcting my answer from the ars answer posted below.

Looks like your file is becoming modified by the HTML upload. Is there anything stopping you from just ftp'ing in and dropping the csv file where you need it?

Once the CSV file is more proper, here is a quick function that will put it into a 2D array:

def genTableFrCsv(incsv):
    table = []
    fin = open(incsv, 'rb')
    reader = csv.reader(fin)
    for row in reader: 
        table.append(row)
    fin.close()
    return table

From here you can then operate on the whole list in memory rather than pulling bit by bit from the file as in Vitor's solution.

The easy solution is rows = [row.split('\\t') for r in csv_string.split('\\r\\n')] . It's only error proned if you have users from different platforms submit data. They might submit comas or tabs and their line breaks could be \\n, \\r\\n, \\r, or ^M. The easiest solution is to use regular expressions. Book mark this page if you don't know regular expressions:

http://regexlib.com/CheatSheet.aspx

And here's the solution:

import re

csv_string = 'col1,col2'\t\r\nx,y' #obviously your csv opening code goes here

rows = re.findall(r'(.*?)[\t,](.*?)',csv_string)
rows = rows[1:] # remove header

Rows is now a list of tuples for all of the rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM