[英]Loop through excel sheets and save each sheet into a csv based on a condition
I have an excel file that has multiple sheets.我有一个包含多张纸的 excel 文件。 I would like to iterate through each sheet and check against a string to either read the file after 0 rows or 4 rows.
我想遍历每张纸并检查字符串以在 0 行或 4 行之后读取文件。 (As some of the sheets datasets start after the first 4 rows) After the sheet gets read I want to save the file as a csv.
(因为一些工作表数据集在前 4 行之后开始)在工作表被读取后,我想将文件保存为 csv。
This is my code so far, but I am not sure if I am doing the loop correctly.到目前为止,这是我的代码,但我不确定我是否正确地执行了循环。
import pandas as pd
def converToCsv(excel_file):
df = pd.read_excel(excel_file, sheet_name = None)
for sheets in df.items():
if sheets[df.items()] == 'Shipment':
newdf = pd.read_excel(excel_file, sheet_name = sheets[df.items(), header = 4]
newdf.to_csv('path', decimal = ',', index = False)
else:
newdf = pd.read_excel(excel_file, sheet_name = sheets[df.items(), header = 0]
newdf.to_csv('path', decimal = ',', index = False)
There are several things not working with the snippet you posted:有几件事不适用于您发布的代码段:
sheets[df.items()]
is not valid. sheets[df.items()]
无效。 df.items()
returns a dict like: {<sheet_name>: <sheet_content>}
, so you can not use it as an index df.items()
返回一个类似的字典: {<sheet_name>: <sheet_content>}
,所以你不能将它用作索引path
, doing so you are overwriting the previously saved sheet to csv on each loop turn.path
,这样做您将在每个循环转弯时将先前保存的工作表覆盖到 csv。 Did you try running this code before posting it?您是否在发布之前尝试运行此代码?
You could do something along those lines:你可以按照这些思路做一些事情:
import pandas as pd
def converToCsv(excel_file):
workbook = pd.read_excel(excel_file, sheet_name = None)
for sheet_name in workbook.keys():
header = 0
if sheet_name == 'Shipment':
header = 4
newdf = pd.read_excel(excel_file, sheet_name = sheet_name, header=header)
# TODO: handle the case where sheet name is not a valid file name
newdf.to_csv(f"{sheet_name}.csv", decimal = ',', index = False)
converToCsv("test.xlsx")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.