简体   繁体   English

将多个 excel 文件导入 python pandas 并拼接成一个 Z6A8064B5DF4794555500553C4DC7

[英]Import multiple excel files into python pandas and concatenate them into one dataframe

I would like to read several excel files from a directory into pandas and concatenate them into one big dataframe.我想从一个目录中读取几个 excel 文件到 pandas 并将它们连接成一个大 dataframe。 I have not been able to figure it out though.我一直无法弄清楚。 I need some help with the for loop and building a concatenated dataframe: Here is what I have so far:我需要一些有关 for 循环和构建串联 dataframe 的帮助:这是我目前所拥有的:

import sys
import csv
import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files\excelfiles'
filenames = glob.glob(path + "/*.xlsx")

dfs = []

for df in dfs: 
    xl_file = pd.ExcelFile(filenames)
    df=xl_file.parse('Sheet1')
    dfs.concat(df, ignore_index=True)

As mentioned in the comments, one error you are making is that you are looping over an empty list.正如评论中提到的,您犯的一个错误是您正在循环一个空列表。

Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.下面是我将如何做到这一点,使用一个有 5 个相同的 Excel 文件的示例,这些文件一个接一个地附加。

(1) Imports: (1) 进口:

import os
import pandas as pd

(2) List files: (2) 列表文件:

path = os.getcwd()
files = os.listdir(path)
files

Output: Output:

['.DS_Store',
 '.ipynb_checkpoints',
 '.localized',
 'Screen Shot 2013-12-28 at 7.15.45 PM.png',
 'test1 2.xls',
 'test1 3.xls',
 'test1 4.xls',
 'test1 5.xls',
 'test1.xls',
 'Untitled0.ipynb',
 'Werewolf Modelling',
 '~$Random Numbers.xlsx']

(3) Pick out 'xls' files: (3) 挑选出'xls'文件:

files_xls = [f for f in files if f[-3:] == 'xls']
files_xls

Output: Output:

['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']

(4) Initialize empty dataframe: (4)初始化空dataframe:

df = pd.DataFrame()

(5) Loop over list of files to append to empty dataframe: (5) 将文件列表循环到 append 到清空 dataframe:

for f in files_xls:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

(6) Enjoy your new dataframe.:-) (6) 享受您的新 dataframe.:-)

df

Output: Output:

  Result  Sample
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10

this works with python 2.x这适用于 python 2.x

be in the directory where the Excel files are在 Excel 文件所在的目录中

see http://pbpython.com/excel-file-combine.htmlhttp://pbpython.com/excel-file-combine.html

import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()    

This can be done in this way:这可以通过以下方式完成:

import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("/path/to/directory/*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

all_data.to_csv("new_combined_file.csv")  

#shortcut #捷径

import pandas as pd 
from glob import glob

dfs=[]
for f in glob("data/*.xlsx"):
    dfs.append(pd.read_excel(f))
df=pd.concat(dfs, ignore_index=True)

You can use list comprehension inside concat :您可以在concat中使用列表推导:

import os
import pandas

path = '/path/to/directory/'
filenames = [file for file in os.listdir(path) if file.endswith('.xlsx')]

df = pd.concat([pd.read_excel(path + file) for file in filenames], ignore_index=True)

With ignore_index = True the index of df will be labeled 0, …, n - 1 .使用ignore_index = Truedf的索引将被标记为0, ..., n - 1

I have multiple excel files and every file has a common id [every excel sheet has id column].我有多个 excel 文件,每个文件都有一个共同的 id [每个 excel 表都有 id 列]。 I tried in the following ways.我尝试了以下方法。 I am not getting the correct data frame based on the id.我没有根据 id 获得正确的数据框。 import pandas as pd import os导入熊猫作为 pd 导入 o​​s

path=os.getcwd()
path
files=os.listdir(path)
fil_xlsx=[f for f in files if f[-4:]=='xlsx']

df=pd.DataFrame()

for f in fil_xlsx:
    data=pd.read_excel(f,'Sheet1')
    df=df.append(data)

I am getting an empty data frame this way.我通过这种方式得到一个空的数据框。

df=pd.DataFrame()
      for f in fil_xlsx:
    data=pd.read_excel(f,'Sheet1')
    all1=pd.concat([data,df],ignore_index=True,join="inner")

There is an even neater way to do that.有一种更简洁的方法可以做到这一点。

# import libraries
import glob
import pandas as pd

# get the absolute path of all Excel files 
allExcelFiles = glob.glob("/path/to/Excel/files/*.xlsx")

# read all Excel files at once
df = pd.concat(pd.read_excel(excelFile) for excelFile in allExcelFiles)
import pandas as pd

import os

os.chdir('...')

#read first file for column names

fdf= pd.read_excel("first_file.xlsx", sheet_name="sheet_name")

#create counter to segregate the different file's data

fdf["counter"]=1

nm= list(fdf)

c=2

#read first 1000 files

for i in os.listdir():

  print(c)

  if c<1001:

    if "xlsx" in i:

      df= pd.read_excel(i, sheet_name="sheet_name")

      df["counter"]=c

      if list(df)==nm:

        fdf=fdf.append(df)

        c+=1

      else:

        print("headers name not match")

    else:

      print("not xlsx")


fdf=fdf.reset_index(drop=True)

#relax
import pandas as pd
import os

files = [file for file in os.listdir('./Salesfolder')]
all_month_sales= pd.DataFrame()
for file in files
    df= pd.read_csv("./Salesfolder/"+file)
    all_months_data=pd.concat([all_months_sales,df])
all_months_data.to_csv("all_data.csv",index=False)

You can go and read all your.xls files from folder (Salesfolder in my case) and same for your local path.您可以 go 并从文件夹(在我的情况下为 Salesfolder)中读取所有 your.xls 文件,对于您的本地路径也是如此。 Using iteration through whcih you can put them into empty data frame and you can concatnate your data frame to this.通过 whcih 使用迭代,您可以将它们放入空数据框中,您可以将您的数据框连接到此。 I have also exported to another csv for all months data into one csv file我还将所有月份的数据导出到另一个 csv 到一个 csv 文件中

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将不同文件中的多个excel表导入python并将它们连接成一个数据框 - Import multiple excel sheets from different files into python and concatenate them into one dataframe 无法将多个 csv 文件导入到 Pandas 中并在 Python 中连接为一个 DataFrame - Failed to import multiple csv files into pandas and concatenate into one DataFrame in Python 如何使用 pandas 导入多个 csv 文件并连接成一个 DataFrame - How to import multiple csv files and concatenate into one DataFrame using pandas 不完整 将多个 csv 文件导入 pandas 并拼接成一个 DataFrame - Not full Import multiple csv files into pandas and concatenate into one DataFrame 将多个CSV文件导入pandas并拼接成一个DataFrame - Import multiple CSV files into pandas and concatenate into one DataFrame 使用 Python 中的格式导入不同文件夹中的多个文件并将它们连接起来 - Import multiple files in different folders and concatenate them, using format in Python 导入多个嵌套的csv文件并将其串联到一个DataFrame中 - Import multiple nested csv files and concatenate into one DataFrame 按创建日期过滤多个 csv 文件并连接成一个 pandas DataFrame - Filtering multiple csv files by creation date and concatenate into one pandas DataFrame Import multiple csv files into pandas and concatenate into one DataFrame where 1st column same in all csv and no headers of data just file name - Import multiple csv files into pandas and concatenate into one DataFrame where 1st column same in all csv and no headers of data just file name Python数据框可导入多个Excel文件-坚持将文件名添加到数据框 - Python Dataframe to import multiple Excel files - stuck with adding the filename to the dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM