簡體   English   中英

使用python將Concat Excel文件和工作表合二為一

[英]Concat excel files and worksheets into one using python

我的目錄中有許多Excel文件,它們都具有相同的標題行。 其中一些excel文件具有多個工作表,這些工作表又具有相同的標題。 我試圖遍歷目錄中的excel文件,並為每個檢查是否有多個工作表來連接它們以及其余的excel文件。

這是我嘗試的:

import pandas as pd
import os
import ntpath
import glob

dir_path = os.path.dirname(os.path.realpath(__file__))
os.chdir(dir_path)

for excel_names in glob.glob('*.xlsx'):
    # read them in
    i=0
    df = pd.read_excel(excel_names[i], sheet_name=None, ignore_index=True)
    cdf = pd.concat(df.values())
    cdf.to_excel("c.xlsx", header=False, index=False)
    excels = [pd.ExcelFile(name) for name in excel_names]

    # turn them into dataframes
    frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]

    # delete the first row for all frames except the first
    # i.e. remove the header row -- assumes it's the first
    frames[1:] = [df[1:] for df in frames[1:]]

    # concatenate them..
    combined = pd.concat(frames)

    # write it out
    combined.to_excel("c.xlsx", header=False, index=False)
    i+=1

但是然后我得到以下錯誤任何建議嗎?

"concat excel.py", line 12, in <module>
    df = pd.read_excel(excel_names[i], sheet_name=None, ignore_index=True)
  File "/usr/local/lib/python2.7/site-packages/pandas/util/_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pandas/util/_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/excel.py", line 350, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/excel.py", line 653, in __init__
    self._reader = self._engines[engine](self._io)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/excel.py", line 424, in __init__
    self.book = xlrd.open_workbook(filepath_or_buffer)
  File "/usr/local/lib/python2.7/site-packages/xlrd/__init__.py", line 111, in open_workbook
    with open(filename, "rb") as f:
IOError: [Errno 2] No such file or directory: 'G'

您的for語句依次將excel_names設置為每個文件名(因此,更好的變量名為excel_name ):

for excel_names in glob.glob('*.xlsx'):

但是在循環內您的代碼確實

df = pd.read_excel(excel_names[i], sheet_name=None, ignore_index=True)

您顯然希望excel_names是從中提取一個元素的列表。 但這不是一個列表,而是一個字符串。 因此,您將獲得第一個文件名的第一個字符。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM