简体   繁体   English

使用 pandas 具有不同列的多个文件

[英]Multiple files with with different Columns using pandas

I have a large number of Excel files with different Columns我有大量不同列的 Excel 文件

For Example:例如:

File 1:文件 1:

Name | sale | Tips
-------------
sam  |  9   | 7
cham |  2   | 2

File 2:文件 2:

Name | sale | Items
-------------------
mini |  6    | Tshirt
Lary |  3    | Hat

Output: Output:

Name |  sale | Items
--------------------
sam  |  9    | Nan
cham |  2    | Nan
mini |  6    | Tshirt
Lary |  3    | Hat

I have 500 files to create into one data Set我有 500 个文件要创建到一个数据集中

This code is working to an extent, But unless all the columns are the same.此代码在一定程度上有效,但除非所有列都相同。

import pandas as pd
import glob,os
import numpy as np


inputFile = 'C:/Users/Desktop/test'

all_workbooks =glob.glob(os.path.join(inputFile,'*.xlsx'))

column_list = []
for files in all_workbooks:
    
    data= pd.read_excel(files,header =0,sheet_name='sheet1')
    column_list.append(data)
    stack_np = np.vstack(column_list)
    newData = pd.DataFrame(stack_np,columns=['Name','Sale'])

print(newData)

This code works if I have the same columns in all the files.如果我在所有文件中都有相同的列,则此代码有效。

Can anyone help me with a solution, if I have unordered columns?如果我有无序的列,任何人都可以帮助我解决问题吗?

You need to collect the dataframes and concatenate them at after the loop您需要收集数据帧并在循环后将它们连接起来

all_dfs =[]
wanted_columns = ['Name', 'sale', 'Items']
for files in all_workbooks:
    data = pd.read_excel(files,header =0,sheet_name='sheet1')
    data = data[wanted_columns] # or skip this line to use all columns
    all_dfs.append(data)

master_df = pd.concat(all_dfs)
del all_dfs, data

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM