简体   繁体   中英

Python Pandas - Read CSV or Excel

I'm allowing users to upload a CSV or Excel file. I'm using pandas to read the file and create a dataframe. Since I can't predict which filetype the user will upload, I wrapped pd.read_csv() and pd.read_excel() in a try/except block.

if form.validate_on_submit():
    input_filename = secure_filename(form.file.data.filename)
    try:
        df = pd.read_csv(form.file.data, header=0, skip_blank_lines=True, skipinitialspace=True, encoding='latin-1')
    except:
        df = pd.read_excel(form.file.data, header=0, skip_blank_lines=True, skipinitialspace=True, encoding='latin-1')

If pd.read_csv() is first in the try/except block and I upload a .csv file it works. If I attempt to upload a .xlsx file, I get this error:

TypeError: expected str, bytes or os.PathLike object, not NoneType

If pd.read_excel() is first in the try/except block and I upload an .xlsx file it works. If I attempt to upload a .csv file, I get this error:

pandas.io.common.EmptyDataError: No columns to parse from file

Previously, I used mimetype to route the file to the correct pandas function, but I was hoping for a cleaner (and all encompassing) solution that didn't involve several if/elif statements. This is what I had:

if form.file.data.mimetype == 'text/csv':
    df = pd.read_csv(form.file.data, header=0, skip_blank_lines=True, skipinitialspace=True, encoding='latin-1')
elif form.file.data.mimetype == 'application/octet-stream':
    df = pd.read_excel(form.file.data, header=0, skip_blank_lines=True, skipinitialspace=True, encoding='latin-1')
elif form.file.data.mimetype == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
    df = pd.read_excel(form.file.data, header=0, skip_blank_lines=True, skipinitialspace=True, encoding='latin-1')
else:
    flash('Error Uploading File. Invalid file type. Please use xls, xlsx or csv.', 'danger')
    return render_template('upload.html', current_user=current_user, form=form)

I'm using Flask, WTForms and Python 3. Thank you.

You are calling read_excel with keyword args that are useful for read_csv but not supported by read_excel . Instead you might try:

if form.validate_on_submit():
    input_filename = secure_filename(form.file.data.filename)
    data = form.file.data
    try:
        df = pd.read_csv(data, header=0, skip_blank_lines=True, 
                         skipinitialspace=True, encoding='latin-1')
    except:
        df = pd.read_excel(data, header=0)

In addition to removing the extra args to read_excel , I've also hoisted extracting the data out of form.file.data ; this is combat the possibility that that there could be some lazy-load behavior interacting poorly with the try / except block.

In general it is hard to debug moderately complex I/O functions in the midst of web requests. When operations like this don't work, best approach is to split the problem into two parts: 1/ Get data from the web request, write it to a file. Then separately, 2/ try the I/O (in this case, Pandas dataframe load) from the resulting file. Doing this interactively or in a separate program will give you more debugging opportunities and clarity. Jupyter Notebook is excellent for such exploratory tests, though most IDEs or even the bare Python REPL will work. When part 2/ is clearly working, then you can patch it back in under the Flask / web app code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM