I have a large number of Excel files with different Columns
For Example:
File 1:
Name | sale | Tips
-------------
sam | 9 | 7
cham | 2 | 2
File 2:
Name | sale | Items
-------------------
mini | 6 | Tshirt
Lary | 3 | Hat
Output:
Name | sale | Items
--------------------
sam | 9 | Nan
cham | 2 | Nan
mini | 6 | Tshirt
Lary | 3 | Hat
I have 500 files to create into one data Set
This code is working to an extent, But unless all the columns are the same.
import pandas as pd
import glob,os
import numpy as np
inputFile = 'C:/Users/Desktop/test'
all_workbooks =glob.glob(os.path.join(inputFile,'*.xlsx'))
column_list = []
for files in all_workbooks:
data= pd.read_excel(files,header =0,sheet_name='sheet1')
column_list.append(data)
stack_np = np.vstack(column_list)
newData = pd.DataFrame(stack_np,columns=['Name','Sale'])
print(newData)
This code works if I have the same columns in all the files.
Can anyone help me with a solution, if I have unordered columns?
You need to collect the dataframes and concatenate them at after the loop
all_dfs =[]
wanted_columns = ['Name', 'sale', 'Items']
for files in all_workbooks:
data = pd.read_excel(files,header =0,sheet_name='sheet1')
data = data[wanted_columns] # or skip this line to use all columns
all_dfs.append(data)
master_df = pd.concat(all_dfs)
del all_dfs, data
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.