Python腳本讀取一個目錄中的多個excel文件並將它們轉換為另一個目錄中的.csv文件

Question

我對 python 和 Stackoverflow 比較陌生，但希望任何人都可以對我當前的問題有所了解。 我有一個 python 腳本，它從一個目錄中獲取 excel 文件（.xls 和 .xlsx）並將它們轉換為 .csv 文件到另一個目錄。 它在我的示例 excel 文件（由 4 列和 1 行組成，用於測試）上運行得非常好，但是當我嘗試針對具有 excel 文件的不同目錄（文件大小很大）運行我的腳本時，我得到了斷言錯誤。 我附上了我的代碼和錯誤。 期待對這個問題有一些指導。 謝謝！

import os
import pandas as pd

source = "C:/.../TestFolder"
output = "C:/.../OutputCSV"

dir_list = os.listdir(source)

os.chdir(source)

for i in range(len(dir_list)):
    filename = dir_list[i]
    book = pd.ExcelFile(filename)

    #writing to csv
    if filename.endswith('.xlsx') or filename.endswith('.xls'):
        for i in range(len(book.sheet_names)):
            df = pd.read_excel(book, book.sheet_names[i])

            os.chdir(output)

            new_name = filename.split('.')[0] + str(book.sheet_names[i])+'.csv'
            df.to_csv(new_name, index = False)

        os.chdir(source)

print "New files: ", os.listdir(output)

Answer 1

由於您使用的是 Windows，請考慮使用 Jet/ACE SQL 引擎（Windows .dll 文件）來查詢 Excel 工作簿並導出到 CSV 文件，從而繞過使用 Pandas 數據幀加載/導出的需要。

具體來說，使用pyodbc建立到 Excel 文件的 ODBC 連接，遍歷每個工作表並使用SELECT * INTO ... SQL 操作查詢導出到 csv 文件。 openpyxl模塊用於檢索工作表名稱。 下面的腳本不依賴於相對路徑，因此可以從任何地方運行。 假設每個 Excel 文件都有完整的標題列（頂行的使用范圍內沒有丟失的單元格）。

import os
import pyodbc
from openpyxl import load_workbook

source = "C:/Path/To/TestFolder"
output = "C:/Path/To/OutputCSV"

dir_list = os.listdir(source)

for xlfile in dir_list:
    strfile = os.path.join(source, xlfile)

    if strfile.endswith('.xlsx') or strfile.endswith('.xls'):
        # CONNECT TO WORKBOOK
        conn = pyodbc.connect(r'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};' + \
                               'DBQ={};'.format(strfile), autocommit=True)
        # RETRIEVE WORKBOOK SHEETS
        sheets = load_workbook(filename = strfile, use_iterators = True).get_sheet_names()

        # ITERATIVELY EXPORT SHEETS TO CSV IN OUTPUT FOLDER
        for s in sheets:
            outfile = os.path.join(output, '{0}_{1}.csv'.format(xlfile.split('.')[0], s))
            if os.path.exists(outfile): os.remove(outfile)

            strSQL = " SELECT * " + \
                     " INTO [text;HDR=Yes;Database={0};CharacterSet=65001].[{1}]" + \
                     " FROM [{2}$]"            
            conn.execute(strSQL.format(output, os.path.basename(outfile, s))
        conn.close()

**注意：此過程會創建一個與每次迭代連接的schema.ini文件。 可以刪除。

Python腳本讀取一個目錄中的多個excel文件並將它們轉換為另一個目錄中的.csv文件

問題描述

1 個解決方案

解決方案1
0 2017-02-04 03:30:00

Python腳本讀取一個目錄中的多個excel文件並將它們轉換為另一個目錄中的.csv文件

問題描述

1 個解決方案

解決方案1 0 2017-02-04 03:30:00

解決方案1
0 2017-02-04 03:30:00