[英]How to convert multiple excel files to CSV utf-8 encoding using python
我在同一目錄中有 30 多個xlsx
文件,並且使用python
我想將所有文件轉換為具有 utf-8 編碼的 csv,無論文件中存在什么編碼。 我正在使用 python 的魔法庫來獲取文件名(下面的代碼)。對於轉換,我嘗試了 SO 用戶 Julian 在這里提到的代碼(我使用了這里發布的代碼),但是代碼拋出了一個錯誤,說"InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
。下面是報錯的代碼。第二個問題是基於我的有限python
知識 我相信代碼適用於一個excel文件。我應該如何讓它適用於多個文件?
在此先感謝您的幫助!
# import a library to detect encodings
import magic
import glob
print("File".ljust(45), "Encoding")
for filename in glob.glob('path*.xlsx'):
with open(filename, 'rb') as rawdata:
result = magic.from_buffer(rawdata.read(2048))
print(filename.ljust(45), result)
此處提到的來自 SO User github 鏈接的代碼拋出錯誤
from openpyxl import load_workbook
import csv
from os import sys
def get_all_sheets(excel_file):
sheets = []
workbook = load_workbook(excel_file,read_only=True,data_only=True)
all_worksheets = workbook.get_sheet_names()
for worksheet_name in all_worksheets:
sheets.append(worksheet_name)
return sheets
def csv_from_excel(excel_file, sheets):
workbook = load_workbook(excel_file,data_only=True)
for worksheet_name in sheets:
print("Export " + worksheet_name + " ...")
try:
worksheet = workbook.get_sheet_by_name(worksheet_name)
except KeyError:
print("Could not find " + worksheet_name)
sys.exit(1)
your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for row in worksheet.iter_rows():
lrow = []
for cell in row:
lrow.append(cell.value)
wr.writerow(lrow)
print(" ... done")
your_csv_file.close()
if not 2 <= len(sys.argv) <= 3:
print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
sys.exit(1)
else:
sheets = []
if len(sys.argv) == 3:
sheets = list(sys.argv[2].split(','))
else:
sheets = get_all_sheets(sys.argv[1])
assert(sheets != None and len(sheets
) > 0)
csv_from_excel(sys.argv[1], sheets)
您是否嘗試過使用Pandas
庫? 您可以使用os
將所有文件存儲在列表中。 然后,您可以遍歷列表並使用read_excel
打開每個Excel
文件,然后寫入csv
。 所以它看起來像這樣:
"""Code to read excel workbooks and output each sheet as a csv"""
""""with utf-8 encoding"""
#Declare a file path where you will store all your excel workbook. You
#can update the file path for the ExcelPath variable
#Declare a file path where you will store all your csv output. You can
#update the file path for the CsvPath variable
import pandas as pd
import os
ExcelPath = "C:/ExcelPath" #Store path for your excel workbooks
CsvPath = "C:/CsvPath" #Store path for you csv outputs
fileList = [f for f in os.listdir(ExcelPath)]
for file in fileList:
xls = pd.ExcelFile(ExcelPath+'/'+file)
sheets = xls.sheet_names #Get the names of each and loop to create
#individual csv files
for sheet in sheets:
fileNameCSV = str(file)[:-5]+'.'+str(sheet) #declare the csv
#filename which will be excelWorkbook + SheetName
df = pd.read_excel(ExcelPath+'/'+file, sheet_name = sheet)
os.chdir(CsvPath)
df.to_csv("{}.csv".format(fileNameCSV), encoding="utf-8")
不是最好的,但應該滿足您的需求
首先,第一個錯誤很明顯: InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first.
Excel 是否成功打開此文件? 如果是,我們需要工作簿(或其中的一小部分)。
第二個問題的答案:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# vi:ts=4:et
"""I test to open multiple files."""
import csv
from pathlib import Path
from openpyxl import load_workbook
# find all *.xlsx files into current directory
# and iterate over it
for file in Path('.').glob('*.xlsx'):
# read the Excel file
wb = load_workbook(file)
# small test (optional)
print(file, wb.active.title)
# export all sheets to CSV
for sheetname in wb.sheetnames:
# Write to utf-8 encoded file with BOM signature
with open(f'{file.stem}-{sheetname}.csv', 'w',
encoding="utf-8-sig") as csvfile:
# Write to CSV
spamwriter = csv.writer(csvfile)
# Iterate over rows in sheet
for row in wb[sheetname].rows:
# Write a row
spamwriter.writerow([cell.value for cell in row])
您也可以將 csv 的方言明確指定為csv.writer參數。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.