[英]How to convert multiple excel files to CSV utf-8 encoding using python
I have 30+ xlsx
files in same directory and using python
I would like to convert all files to csv with utf-8 encoding, regardless of whatever encoding is present in the file.我在同一目录中有 30 多个
xlsx
文件,并且使用python
我想将所有文件转换为具有 utf-8 编码的 csv,无论文件中存在什么编码。 I am using python's magic library to get the file names (below code).For conversion, I tried the code mention by SO user Julian here (I used the code posted here ), but the code is throwing an error saying "InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
. Below is the code that is throwing an error.The second issue is based on my limited python
knowledge I believe code will work for one excel file. How should I make it work for multiple files ?我正在使用 python 的魔法库来获取文件名(下面的代码)。对于转换,我尝试了 SO 用户 Julian 在这里提到的代码(我使用了这里发布的代码),但是代码抛出了一个错误,说
"InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
。下面是报错的代码。第二个问题是基于我的有限python
知识 我相信代码适用于一个excel文件。我应该如何让它适用于多个文件?
Thanks in advance for your help!在此先感谢您的帮助!
# import a library to detect encodings
import magic
import glob
print("File".ljust(45), "Encoding")
for filename in glob.glob('path*.xlsx'):
with open(filename, 'rb') as rawdata:
result = magic.from_buffer(rawdata.read(2048))
print(filename.ljust(45), result)
Code throwing error from SO User github link mentioned here 此处提到的来自 SO User github 链接的代码抛出错误
from openpyxl import load_workbook
import csv
from os import sys
def get_all_sheets(excel_file):
sheets = []
workbook = load_workbook(excel_file,read_only=True,data_only=True)
all_worksheets = workbook.get_sheet_names()
for worksheet_name in all_worksheets:
sheets.append(worksheet_name)
return sheets
def csv_from_excel(excel_file, sheets):
workbook = load_workbook(excel_file,data_only=True)
for worksheet_name in sheets:
print("Export " + worksheet_name + " ...")
try:
worksheet = workbook.get_sheet_by_name(worksheet_name)
except KeyError:
print("Could not find " + worksheet_name)
sys.exit(1)
your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for row in worksheet.iter_rows():
lrow = []
for cell in row:
lrow.append(cell.value)
wr.writerow(lrow)
print(" ... done")
your_csv_file.close()
if not 2 <= len(sys.argv) <= 3:
print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
sys.exit(1)
else:
sheets = []
if len(sys.argv) == 3:
sheets = list(sys.argv[2].split(','))
else:
sheets = get_all_sheets(sys.argv[1])
assert(sheets != None and len(sheets
) > 0)
csv_from_excel(sys.argv[1], sheets)
Have you tried to use Pandas
library?您是否尝试过使用
Pandas
库? You can store all the files in a list using os
.您可以使用
os
将所有文件存储在列表中。 You can then loop through the list and open each Excel
file using read_excel
and then write to a csv
.然后,您可以遍历列表并使用
read_excel
打开每个Excel
文件,然后写入csv
。 So it will look something like this:所以它看起来像这样:
"""Code to read excel workbooks and output each sheet as a csv"""
""""with utf-8 encoding"""
#Declare a file path where you will store all your excel workbook. You
#can update the file path for the ExcelPath variable
#Declare a file path where you will store all your csv output. You can
#update the file path for the CsvPath variable
import pandas as pd
import os
ExcelPath = "C:/ExcelPath" #Store path for your excel workbooks
CsvPath = "C:/CsvPath" #Store path for you csv outputs
fileList = [f for f in os.listdir(ExcelPath)]
for file in fileList:
xls = pd.ExcelFile(ExcelPath+'/'+file)
sheets = xls.sheet_names #Get the names of each and loop to create
#individual csv files
for sheet in sheets:
fileNameCSV = str(file)[:-5]+'.'+str(sheet) #declare the csv
#filename which will be excelWorkbook + SheetName
df = pd.read_excel(ExcelPath+'/'+file, sheet_name = sheet)
os.chdir(CsvPath)
df.to_csv("{}.csv".format(fileNameCSV), encoding="utf-8")
Not the best but should meet your needs不是最好的,但应该满足您的需求
In first, the first error is obvious: InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first.
首先,第一个错误很明显:
InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first.
Does Excel successfully open this file? Excel 是否成功打开此文件? If yes, we need the workbook (or small part of it).
如果是,我们需要工作簿(或其中的一小部分)。
The answer to the second question:第二个问题的答案:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# vi:ts=4:et
"""I test to open multiple files."""
import csv
from pathlib import Path
from openpyxl import load_workbook
# find all *.xlsx files into current directory
# and iterate over it
for file in Path('.').glob('*.xlsx'):
# read the Excel file
wb = load_workbook(file)
# small test (optional)
print(file, wb.active.title)
# export all sheets to CSV
for sheetname in wb.sheetnames:
# Write to utf-8 encoded file with BOM signature
with open(f'{file.stem}-{sheetname}.csv', 'w',
encoding="utf-8-sig") as csvfile:
# Write to CSV
spamwriter = csv.writer(csvfile)
# Iterate over rows in sheet
for row in wb[sheetname].rows:
# Write a row
spamwriter.writerow([cell.value for cell in row])
Also you can explicitly specify the dialect of csv as csv.writer parameter.您也可以将 csv 的方言明确指定为csv.writer参数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.