将多个 excel 文件导入 python pandas 并拼接成一个 Z6A8064B5DF4794555500553C4DC7

Question

I would like to read several excel files from a directory into pandas and concatenate them into one big dataframe.我想从一个目录中读取几个 excel 文件到 pandas 并将它们连接成一个大 dataframe。 I have not been able to figure it out though.我一直无法弄清楚。 I need some help with the for loop and building a concatenated dataframe: Here is what I have so far:我需要一些有关 for 循环和构建串联 dataframe 的帮助：这是我目前所拥有的：

import sys
import csv
import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files\excelfiles'
filenames = glob.glob(path + "/*.xlsx")

dfs = []

for df in dfs: 
    xl_file = pd.ExcelFile(filenames)
    df=xl_file.parse('Sheet1')
    dfs.concat(df, ignore_index=True)

Answer 1

As mentioned in the comments, one error you are making is that you are looping over an empty list.正如评论中提到的，您犯的一个错误是您正在循环一个空列表。

Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.下面是我将如何做到这一点，使用一个有 5 个相同的 Excel 文件的示例，这些文件一个接一个地附加。

(1) Imports: (1) 进口：

import os
import pandas as pd

(2) List files: (2) 列表文件：

path = os.getcwd()
files = os.listdir(path)
files

Output: Output：

['.DS_Store',
 '.ipynb_checkpoints',
 '.localized',
 'Screen Shot 2013-12-28 at 7.15.45 PM.png',
 'test1 2.xls',
 'test1 3.xls',
 'test1 4.xls',
 'test1 5.xls',
 'test1.xls',
 'Untitled0.ipynb',
 'Werewolf Modelling',
 '~$Random Numbers.xlsx']

(3) Pick out 'xls' files: (3) 挑选出'xls'文件：

files_xls = [f for f in files if f[-3:] == 'xls']
files_xls

Output: Output：

['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']

(4) Initialize empty dataframe: (4)初始化空dataframe：

df = pd.DataFrame()

(5) Loop over list of files to append to empty dataframe: (5) 将文件列表循环到 append 到清空 dataframe：

for f in files_xls:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

(6) Enjoy your new dataframe.:-) (6) 享受您的新 dataframe.:-)

df

Output: Output：

  Result  Sample
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10

Answer 2

this works with python 2.x这适用于 python 2.x

be in the directory where the Excel files are在 Excel 文件所在的目录中

see http://pbpython.com/excel-file-combine.html见http://pbpython.com/excel-file-combine.html

import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()

Answer 3

This can be done in this way:这可以通过以下方式完成：

import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("/path/to/directory/*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

all_data.to_csv("new_combined_file.csv")

Answer 4

#shortcut ＃捷径

import pandas as pd 
from glob import glob

dfs=[]
for f in glob("data/*.xlsx"):
    dfs.append(pd.read_excel(f))
df=pd.concat(dfs, ignore_index=True)

Answer 5

You can use list comprehension inside concat :您可以在concat中使用列表推导：

import os
import pandas

path = '/path/to/directory/'
filenames = [file for file in os.listdir(path) if file.endswith('.xlsx')]

df = pd.concat([pd.read_excel(path + file) for file in filenames], ignore_index=True)

With ignore_index = True the index of df will be labeled 0, …, n - 1 .使用ignore_index = True ， df的索引将被标记为0, ..., n - 1 。

Answer 6

I have multiple excel files and every file has a common id [every excel sheet has id column].我有多个 excel 文件，每个文件都有一个共同的 id [每个 excel 表都有 id 列]。 I tried in the following ways.我尝试了以下方法。 I am not getting the correct data frame based on the id.我没有根据 id 获得正确的数据框。 import pandas as pd import os导入熊猫作为 pd 导入 os

path=os.getcwd()
path
files=os.listdir(path)
fil_xlsx=[f for f in files if f[-4:]=='xlsx']

df=pd.DataFrame()

for f in fil_xlsx:
    data=pd.read_excel(f,'Sheet1')
    df=df.append(data)

I am getting an empty data frame this way.我通过这种方式得到一个空的数据框。

df=pd.DataFrame()
      for f in fil_xlsx:
    data=pd.read_excel(f,'Sheet1')
    all1=pd.concat([data,df],ignore_index=True,join="inner")

Answer 7

There is an even neater way to do that.有一种更简洁的方法可以做到这一点。

# import libraries
import glob
import pandas as pd

# get the absolute path of all Excel files 
allExcelFiles = glob.glob("/path/to/Excel/files/*.xlsx")

# read all Excel files at once
df = pd.concat(pd.read_excel(excelFile) for excelFile in allExcelFiles)

Answer 8

import pandas as pd

import os

os.chdir('...')

#read first file for column names

fdf= pd.read_excel("first_file.xlsx", sheet_name="sheet_name")

#create counter to segregate the different file's data

fdf["counter"]=1

nm= list(fdf)

c=2

#read first 1000 files

for i in os.listdir():

  print(c)

  if c<1001:

    if "xlsx" in i:

      df= pd.read_excel(i, sheet_name="sheet_name")

      df["counter"]=c

      if list(df)==nm:

        fdf=fdf.append(df)

        c+=1

      else:

        print("headers name not match")

    else:

      print("not xlsx")


fdf=fdf.reset_index(drop=True)

#relax

Answer 9

import pandas as pd
import os

files = [file for file in os.listdir('./Salesfolder')]
all_month_sales= pd.DataFrame()
for file in files
    df= pd.read_csv("./Salesfolder/"+file)
    all_months_data=pd.concat([all_months_sales,df])
all_months_data.to_csv("all_data.csv",index=False)

You can go and read all your.xls files from folder (Salesfolder in my case) and same for your local path.您可以 go 并从文件夹（在我的情况下为 Salesfolder）中读取所有 your.xls 文件，对于您的本地路径也是如此。 Using iteration through whcih you can put them into empty data frame and you can concatnate your data frame to this.通过 whcih 使用迭代，您可以将它们放入空数据框中，您可以将您的数据框连接到此。 I have also exported to another csv for all months data into one csv file我还将所有月份的数据导出到另一个 csv 到一个 csv 文件中

将多个 excel 文件导入 python pandas 并拼接成一个 Z6A8064B5DF4794555500553C4DC7

问题描述

8 个解决方案

解决方案1
92 已采纳 2014-01-03 16:33:41

解决方案2
6 2018-02-10 04:25:32

解决方案3
1 2021-04-11 14:55:00

解决方案4
1 2022-03-23 11:17:54

解决方案5
1 2022-07-02 16:36:07

解决方案6
0 2020-06-14 08:05:18

解决方案7
0 2022-07-27 07:55:03

解决方案8
-1 2019-06-27 13:21:31

解决方案9
-1 2020-05-25 15:35:08

将多个 excel 文件导入 python pandas 并拼接成一个 Z6A8064B5DF4794555500553C4DC7

问题描述

8 个解决方案

解决方案1 92 已采纳 2014-01-03 16:33:41

解决方案2 6 2018-02-10 04:25:32

解决方案3 1 2021-04-11 14:55:00

解决方案4 1 2022-03-23 11:17:54

解决方案5 1 2022-07-02 16:36:07

解决方案6 0 2020-06-14 08:05:18

解决方案7 0 2022-07-27 07:55:03

解决方案8 -1 2019-06-27 13:21:31

解决方案9 -1 2020-05-25 15:35:08

解决方案1
92 已采纳 2014-01-03 16:33:41

解决方案2
6 2018-02-10 04:25:32

解决方案3
1 2021-04-11 14:55:00

解决方案4
1 2022-03-23 11:17:54

解决方案5
1 2022-07-02 16:36:07

解决方案6
0 2020-06-14 08:05:18

解决方案7
0 2022-07-27 07:55:03

解决方案8
-1 2019-06-27 13:21:31

解决方案9
-1 2020-05-25 15:35:08