[英]Concat excel files and worksheets into one using python
I have many excel files in a directory, all of them has the same header row. 我的目录中有许多Excel文件,它们都具有相同的标题行。 Some of these excel files has multiple worksheets which again have the same headers.
其中一些excel文件具有多个工作表,这些工作表又具有相同的标题。 I'm trying to loop through the excel files in the directory and for each one check if there are multiple worksheets to concat them as well as the rest of the excel files.
我试图遍历目录中的excel文件,并为每个检查是否有多个工作表来连接它们以及其余的excel文件。
This is what I tried: 这是我尝试的:
import pandas as pd
import os
import ntpath
import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
os.chdir(dir_path)
for excel_names in glob.glob('*.xlsx'):
# read them in
i=0
df = pd.read_excel(excel_names[i], sheet_name=None, ignore_index=True)
cdf = pd.concat(df.values())
cdf.to_excel("c.xlsx", header=False, index=False)
excels = [pd.ExcelFile(name) for name in excel_names]
# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]
# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]
# concatenate them..
combined = pd.concat(frames)
# write it out
combined.to_excel("c.xlsx", header=False, index=False)
i+=1
but then I get the below error any advice? 但是然后我得到以下错误任何建议吗?
"concat excel.py", line 12, in <module>
df = pd.read_excel(excel_names[i], sheet_name=None, ignore_index=True)
File "/usr/local/lib/python2.7/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/pandas/io/excel.py", line 350, in read_excel
io = ExcelFile(io, engine=engine)
File "/usr/local/lib/python2.7/site-packages/pandas/io/excel.py", line 653, in __init__
self._reader = self._engines[engine](self._io)
File "/usr/local/lib/python2.7/site-packages/pandas/io/excel.py", line 424, in __init__
self.book = xlrd.open_workbook(filepath_or_buffer)
File "/usr/local/lib/python2.7/site-packages/xlrd/__init__.py", line 111, in open_workbook
with open(filename, "rb") as f:
IOError: [Errno 2] No such file or directory: 'G'
Your for
statement is setting excel_names
to each filename in turn (so a better variable name would be excel_name
): 您的
for
语句依次将excel_names
设置为每个文件名(因此,更好的变量名为excel_name
):
for excel_names in glob.glob('*.xlsx'):
But inside the loop your code does 但是在循环内您的代码确实
df = pd.read_excel(excel_names[i], sheet_name=None, ignore_index=True)
where you are clearly expecting excel_names
to be a list from which you are extracting one element. 您显然希望
excel_names
是从中提取一个元素的列表。 But it isn't a list, it's a string. 但这不是一个列表,而是一个字符串。 So you are getting the first character of the first filename.
因此,您将获得第一个文件名的第一个字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.