簡體   English   中英

如何使用python將多個excel文件轉換為CSV utf-8編碼

[英]How to convert multiple excel files to CSV utf-8 encoding using python

我在同一目錄中有 30 多個xlsx文件,並且使用python我想將所有文件轉換為具有 utf-8 編碼的 csv,無論文件中存在什么編碼。 我正在使用 python 的魔法庫來獲取文件名(下面的代碼)。對於轉換,我嘗試了 SO 用戶 Julian 在這里提到的代碼(我使用了這里發布的代碼),但是代碼拋出了一個錯誤,說"InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm 。下面是報錯的代碼。第二個問題是基於我的有限python知識 我相信代碼適用於一個excel文件。我應該如何讓它適用於多個文件?

在此先感謝您的幫助!

# import a library to detect encodings
import magic
import glob

print("File".ljust(45), "Encoding")
for filename in glob.glob('path*.xlsx'):
    with open(filename, 'rb') as rawdata:
        result = magic.from_buffer(rawdata.read(2048))
    print(filename.ljust(45), result)

此處提到的來自 SO User github 鏈接的代碼拋出錯誤

    from openpyxl import load_workbook
    import csv
    from os import sys
    
    def get_all_sheets(excel_file):
        sheets = []
        workbook = load_workbook(excel_file,read_only=True,data_only=True)
        all_worksheets = workbook.get_sheet_names()
        for worksheet_name in all_worksheets:
            sheets.append(worksheet_name)
        return sheets
    
    def csv_from_excel(excel_file, sheets):
        workbook = load_workbook(excel_file,data_only=True)
        for worksheet_name in sheets:
            print("Export " + worksheet_name + " ...")
    
            try:
                worksheet = workbook.get_sheet_by_name(worksheet_name)
            except KeyError:
                print("Could not find " + worksheet_name)
                sys.exit(1)
    
            your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
            wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
            for row in worksheet.iter_rows():
                lrow = []
                for cell in row:
                    lrow.append(cell.value)
                wr.writerow(lrow)
            print(" ... done")
            your_csv_file.close()
    
    if not 2 <= len(sys.argv) <= 3:
        print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
        sys.exit(1)
    else:
        sheets = []
        if len(sys.argv) == 3:
            sheets = list(sys.argv[2].split(','))
        else:
            sheets = get_all_sheets(sys.argv[1])
        assert(sheets != None and len(sheets

) > 0)
    csv_from_excel(sys.argv[1], sheets)

您是否嘗試過使用Pandas庫? 您可以使用os將所有文件存儲在列表中。 然后,您可以遍歷列表並使用read_excel打開每個Excel文件,然后寫入csv 所以它看起來像這樣:

"""Code to read excel workbooks and output each sheet as a csv""" 
""""with utf-8 encoding"""
#Declare a file path where you will store all your excel workbook. You 
#can update the file path for the ExcelPath variable
#Declare a file path where you will store all your csv output. You can 
#update the file path for the CsvPath variable

import pandas as pd
import os

ExcelPath = "C:/ExcelPath" #Store path for your excel workbooks
CsvPath = "C:/CsvPath" #Store path for you csv outputs

fileList = [f for f in os.listdir(ExcelPath)]

for file in fileList:
    xls = pd.ExcelFile(ExcelPath+'/'+file)
    sheets = xls.sheet_names #Get the names of each and loop to create 
                              #individual csv files 
    for sheet in sheets:
        fileNameCSV = str(file)[:-5]+'.'+str(sheet) #declare the csv 
                      #filename which will be excelWorkbook + SheetName
        df = pd.read_excel(ExcelPath+'/'+file, sheet_name = sheet)
        os.chdir(CsvPath)
        df.to_csv("{}.csv".format(fileNameCSV), encoding="utf-8")

不是最好的,但應該滿足您的需求

首先,第一個錯誤很明顯: InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first.

Excel 是否成功打開此文件? 如果是,我們需要工作簿(或其中的一小部分)。

第二個問題的答案:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# vi:ts=4:et

"""I test to open multiple files."""
import csv
from pathlib import Path

from openpyxl import load_workbook

# find all *.xlsx files into current directory
# and iterate over it
for file in Path('.').glob('*.xlsx'):
    # read the Excel file
    wb = load_workbook(file)
    # small test (optional)
    print(file, wb.active.title)
    # export all sheets to CSV
    for sheetname in wb.sheetnames:
        # Write to utf-8 encoded file with BOM signature
        with open(f'{file.stem}-{sheetname}.csv', 'w',
                  encoding="utf-8-sig") as csvfile:
            # Write to CSV
            spamwriter = csv.writer(csvfile)
            # Iterate over rows in sheet
            for row in wb[sheetname].rows:
                # Write a row
                spamwriter.writerow([cell.value for cell in row])

您也可以將 csv 的方言明確指定為csv.writer參數。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM