使用 pandas 具有不同列的多个文件

Question

I have a large number of Excel files with different Columns我有大量不同列的 Excel 文件

For Example:例如：

File 1:文件 1：

Name | sale | Tips
-------------
sam  |  9   | 7
cham |  2   | 2

File 2:文件 2：

Name | sale | Items
-------------------
mini |  6    | Tshirt
Lary |  3    | Hat

Output: Output：

Name |  sale | Items
--------------------
sam  |  9    | Nan
cham |  2    | Nan
mini |  6    | Tshirt
Lary |  3    | Hat

I have 500 files to create into one data Set我有 500 个文件要创建到一个数据集中

This code is working to an extent, But unless all the columns are the same.此代码在一定程度上有效，但除非所有列都相同。

import pandas as pd
import glob,os
import numpy as np


inputFile = 'C:/Users/Desktop/test'

all_workbooks =glob.glob(os.path.join(inputFile,'*.xlsx'))

column_list = []
for files in all_workbooks:
    
    data= pd.read_excel(files,header =0,sheet_name='sheet1')
    column_list.append(data)
    stack_np = np.vstack(column_list)
    newData = pd.DataFrame(stack_np,columns=['Name','Sale'])

print(newData)

This code works if I have the same columns in all the files.如果我在所有文件中都有相同的列，则此代码有效。

Can anyone help me with a solution, if I have unordered columns?如果我有无序的列，任何人都可以帮助我解决问题吗？

Answer 1

You need to collect the dataframes and concatenate them at after the loop您需要收集数据帧并在循环后将它们连接起来

all_dfs =[]
wanted_columns = ['Name', 'sale', 'Items']
for files in all_workbooks:
    data = pd.read_excel(files,header =0,sheet_name='sheet1')
    data = data[wanted_columns] # or skip this line to use all columns
    all_dfs.append(data)

master_df = pd.concat(all_dfs)
del all_dfs, data

使用 pandas 具有不同列的多个文件

问题描述

1 个解决方案

解决方案1
1 2020-08-31 04:25:39

使用 pandas 具有不同列的多个文件

问题描述

1 个解决方案

解决方案1 1 2020-08-31 04:25:39

解决方案1
1 2020-08-31 04:25:39