Multiple files with with different Columns using pandas

Question

I have a large number of Excel files with different Columns

For Example:

File 1:

Name | sale | Tips
-------------
sam  |  9   | 7
cham |  2   | 2

File 2:

Name | sale | Items
-------------------
mini |  6    | Tshirt
Lary |  3    | Hat

Output:

Name |  sale | Items
--------------------
sam  |  9    | Nan
cham |  2    | Nan
mini |  6    | Tshirt
Lary |  3    | Hat

I have 500 files to create into one data Set

This code is working to an extent, But unless all the columns are the same.

import pandas as pd
import glob,os
import numpy as np


inputFile = 'C:/Users/Desktop/test'

all_workbooks =glob.glob(os.path.join(inputFile,'*.xlsx'))

column_list = []
for files in all_workbooks:
    
    data= pd.read_excel(files,header =0,sheet_name='sheet1')
    column_list.append(data)
    stack_np = np.vstack(column_list)
    newData = pd.DataFrame(stack_np,columns=['Name','Sale'])

print(newData)

This code works if I have the same columns in all the files.

Can anyone help me with a solution, if I have unordered columns?

Answer 1

You need to collect the dataframes and concatenate them at after the loop

all_dfs =[]
wanted_columns = ['Name', 'sale', 'Items']
for files in all_workbooks:
    data = pd.read_excel(files,header =0,sheet_name='sheet1')
    data = data[wanted_columns] # or skip this line to use all columns
    all_dfs.append(data)

master_df = pd.concat(all_dfs)
del all_dfs, data

Multiple files with with different Columns using pandas

Question

1 answers

solution1
1 2020-08-31 04:25:39

Multiple files with with different Columns using pandas

Question

1 answers

solution1 1 2020-08-31 04:25:39

solution1
1 2020-08-31 04:25:39