简体   繁体   中英

How do I concatenate files that have multiple sheets with same column headers but randomly ordered in Python/Pandas?

I have 3 xls files that have 3 sheets per file. All sheets have same column headers but order as you see below is different

1.xls
Name Address Date City State Zip

2.xls
Address Date City Zip Name State

3.xls
City Zip Name Address Date State

I want my final xls file to concatenate all 3 files and sheets

Output.xls
Name Address Date City State Zip RowNumber SheetName

The rownumber should be the specific row number from each file and sheet the data comes from before concatenation.Sheetname should be the sheet it comes from within the xls file.

My attempt-

import os
import pandas as pd
#set src directory
os.chdir('C:/Users/hhh/Desktop/python/Concat')
def read_sheets(filename):
    result = []
    sheets = pd.read_excel(filename, sheet_name=None)
    for name, sheet in sheets.items():
        sheet['Sheetname'] = name
        sheet['Row'] = sheet.index
        result.append(sheet)
    return pd.concat(result, ignore_index=True)
files = [file for file in os.listdir(folder_path) if file.endswith(".xls")] 
dfoo = read_sheets(files)

But nothing happens and i just receive an assertion error saying assert content_or_path is not None. Is this because column orders don't match? is there a workaround? Number of columns are same in all files and sheets. Within each file sheets have same order. But if you compare 1.xls sheets to 2.xls the column order is different as you can see in my reprex above

I believe your question is asking to take 9 different sheets (3 each in 3 different.xls files) and combine them into a single sheet in a new spreadsheet Output.xls.

A few comments to start:

  • The different column order for different input files shouldn't be a problem.
  • You may want to consider having the output file be a.xlsx file instead of.xls, since the xlwt package required to work with.xls files raises a warning:
FutureWarning: 
As the xlwt package is no longer maintained, the xlwt engine will be removed in a future version of pandas. 
This is the only engine in pandas that supports writing in the xls format. 
Install openpyxl and write to an xlsx file instead. 
You can set the option io.excel.xls.writer to 'xlwt' to silence this warning. 
While this option is deprecated and will also raise a warning, it can be globally set and the warning suppressed.
  writer = pd.ExcelWriter('Output.xls')
  • The sample code in your question sends a list of files to the function read_sheets(), so this function needs to be changed to expect this list rather than a single file.
  • The code needs to loop over input files, then over the sheets in each file.

Here is a modification of your code which does what I think you are asking (with a different argument for os.chdir() to match my test environment):

import os
import pandas as pd
#set src directory
#os.chdir('C:/Users/hhh/Desktop/python/Concat')
os.chdir('./Concat')
def read_sheets(files):
    result = []
    for filename in files:
        sheets = pd.read_excel(filename, sheet_name=None)
        for name, sheet in sheets.items():
            sheet['Sheetname'] = name
            sheet['Row'] = sheet.index
            result.append(sheet)
    return pd.concat(result, ignore_index=True)
folder_path = '.'
files = [file for file in os.listdir(folder_path) if file.endswith(".xls")] 
dfCombined = read_sheets(files)
writer = pd.ExcelWriter('Output.xls')
dfCombined.to_excel(writer, index=None, sheet_name='Combined')
writer.save()
writer.close()

The sample output looks like this:

Name    Address             Date    City        State           Zip     Sheetname   Row
Alice   1 Main St           11      Nome        Alaska          11111   Sheet1      0
Bob     1 Main St           12      Providence  Rhode Island    22222   Sheet1      1
Candace 1 Main St           13      Denver      Colorado        33333   Sheet1      2
Dirk    1 Main St           14      Wilmington  Delaware        44444   Sheet1      3
Edward  1 Marvin Gardens    11      Nome        Alaska          11111   Sheet2      0
Fran    1 Marvin Gardens    12      Providence  Rhode Island    22222   Sheet2      1
George  1 Marvin Gardens    13      Denver      Colorado        33333   Sheet2      2
Hannah  1 Marvin Gardens    14      Wilmington  Delaware        44444   Sheet2      3
Irvin   1 St Marks Place    11      Nome        Alaska          11111   Sheet3      0
Jasmine 1 St Marks Place    12      Providence  Rhode Island    22222   Sheet3      1
Kirk    1 St Marks Place    13      Denver      Colorado        33333   Sheet3      2
Lana    1 St Marks Place    14      Wilmington  Delaware        44444   Sheet3      3
Alice   2 Main St           11      Nome        Alaska          11111   Sheet1      0
Bob     2 Main St           12      Providence  Rhode Island    22222   Sheet1      1
Candace 2 Main St           13      Denver      Colorado        33333   Sheet1      2
Dirk    2 Main St           14      Wilmington  Delaware        44444   Sheet1      3
Edward  2 Marvin Gardens    11      Nome        Alaska          11111   Sheet2      0
Fran    2 Marvin Gardens    12      Providence  Rhode Island    22222   Sheet2      1
George  2 Marvin Gardens    13      Denver      Colorado        33333   Sheet2      2
Hannah  2 Marvin Gardens    14      Wilmington  Delaware        44444   Sheet2      3
Irvin   2 St Marks Place    11      Nome        Alaska          11111   Sheet3      0
Jasmine 2 St Marks Place    12      Providence  Rhode Island    22222   Sheet3      1
Kirk    2 St Marks Place    13      Denver      Colorado        33333   Sheet3      2
Lana    2 St Marks Place    14      Wilmington  Delaware        44444   Sheet3      3
Alice   3 Main St           11      Nome        Alaska          11111   Sheet1      0
Bob     3 Main St           12      Providence  Rhode Island    22222   Sheet1      1
Candace 3 Main St           13      Denver      Colorado        33333   Sheet1      2
Dirk    3 Main St           14      Wilmington  Delaware        44444   Sheet1      3
Edward  3 Marvin Gardens    11      Nome        Alaska          11111   Sheet2      0
Fran    3 Marvin Gardens    12      Providence  Rhode Island    22222   Sheet2      1
George  3 Marvin Gardens    13      Denver      Colorado        33333   Sheet2      2
Hannah  3 Marvin Gardens    14      Wilmington  Delaware        44444   Sheet2      3
Irvin   3 St Marks Place    11      Nome        Alaska          11111   Sheet3      0
Jasmine 3 St Marks Place    12      Providence  Rhode Island    22222   Sheet3      1
Kirk    3 St Marks Place    13      Denver      Colorado        33333   Sheet3      2
Lana    3 St Marks Place    14      Wilmington  Delaware        44444   Sheet3      3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM