简体   繁体   中英

Using Pandas to Convert from Excel to CSV and I Have Multiple Possible Excel Sheet Names

I am trying to convert a large number of Excel documents to CSV using Python, and the sheet I am converting from each document can either be called "Pivot", "PVT", "pivot", or "pvt". The way I am doing some right now seems to be working, but I was wondering if there was any quicker way as this takes a long time to go through my Excel files. Is there a way I can accomplish the same thing all in one pd.read_excel line using an OR operator to specify multiple variations of the sheet name?

for f in glob.glob("../Test/Drawsheet*.xlsx"):
    try:
        data_xlsx = pd.read_excel(f, 'PVT', index_col=None)
    except:
        try:
            data_xlsx = pd.read_excel(f, 'pvt', index_col=None)
        except:
            try:
                data_xlsx = pd.read_excel(f, 'pivot', index_col=None)
            except:
                try:
                    data_xlsx = pd.read_excel(f, 'Pivot', index_col=None)
                except:
                    continue
    data_xlsx.to_csv('csvfile' + str(counter) + '.csv', encoding='utf-8')
    counter += 1

Your problem isn't so much about find the correct special syntax for pd.read_excel but rather knowing which sheet to read from. Pandas has an ExcelFile that encapsulates and some basic info about an Excel file. The class has a sheet_names property that tell you what sheets are in the file. (Unfortunately documnetation on this class is a bit hard to find so I can't give you a link)

valid_sheet_names = ['PVT', 'pvt', 'pivot', 'Pivot']

for f in glob.iglob('../Test/Drawsheet*.xlsx'):
    file = pd.ExcelFile(f)
    sheet_name = None

    for name in file.sheet_names:
        if name in valid_sheet_names:
            sheet_name = name
            break

    if sheet_name is None:
        continue

    data_xlsx = pd.read_excel(f, sheet_name, index_col=None)
    ...

However, this is not strictly equivalent to your code as it does not do 2 things:

  • Cascade read_excel if the chosen sheet fails to be loaded into a data frame
  • Have a priority ranking for the sheet names (like PVT first, then pvt , then pivot , etc.)

I'll leave you on how to handle these two problems as your program requires.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM