简体   繁体   English

在pandas中使用不同的工作表名称读取多个excel文件

[英]Read multiple excel file with different sheets names in pandas

To read files from a directory, try the following: 要从目录中读取文件,请尝试以下操作:

import os
import pandas as pd
path=os.getcwd()
files=os.listdir(path)
files

['wind-diciembre.xls', 'stat_noviembre.xls', 'stat_marzo.xls', 'wind-noviembre.xls', 'wind-enero.xls', 'stat_octubre.xls', 'wind-septiembre.xls', 'stat_septiembre.xls', 'wind-febrero.xls', 'wind-marzo.xls', 'wind-julio.xls', 'wind-octubre.xls', 'stat_diciembre.xls', 'stat_julio.xls', 'wind-junio.xls', 'stat_abril.xls', 'stat_enero.xls', 'stat_junio.xls', 'stat_agosto.xls', 'stat_febrero.xls', 'wind-abril.xls', 'wind-agosto.xls']

where: 哪里:

stat_enero

     Fecha  HR  PreciAcu  RadSolar     T  Presion  Tmax  HRmax  \
01/01/2011  37         0       162  18.5        0  31.2     86   
02/01/2011  70         0        58  12.0        0  14.6     95   
03/01/2011  62         0       188  15.3        0  24.9     86   
04/01/2011  69         0       181  17.0        0  29.2     97 
     .
     .
     .

          Presionmax  RadSolarmax  Tmin  HRmin  Presionmin  
    0            0          774  12.3      9           0  
    1            0          314   9.2     52           0  
    2            0          713   8.3     32           0  
    3            0          730   7.7     26           0
    .
    .
    .

and

 wind-enero

            Fecha  MagV  MagMax  Rachas  MagRes  DirRes DirWind
01/08/2011 00:00   4.3    14.1    17.9     1.0   281.3     ONO
02/08/2011 00:00   4.2    15.7    20.6     1.5    28.3     NNE
03/08/2011 00:00   4.6    23.3    25.6     2.9    49.2     ENE
04/08/2011 00:00   4.8    17.9    23.0     2.0    30.5     NNE
    .
    .
    .

The next step is to read, parse and add the files to a dataframe, Now I do the following: 下一步是读取,解析文件并将其添加到数据框中,现在我执行以下操作:

for f in files:
    data=pd.ExcelFile(f)
    data1=data.sheet_names
    print data1
    [u'diciembre']
    [u'Hoja1']
    [u'Hoja1']
    [u'noviembre']
    [u'enero']
    [u'Hoja1']
    [u'septiembre']
    [u'Hoja1']
    [u'febrero']
    [u'marzo']
    [u'julio']
        .
        .
        .

for sheet in data1:
    data2=data.parse(sheet)
data2
                Fecha  MagV  MagMax  Rachas  MagRes  DirRes DirWind
01/08/2011 00:00   4.3    14.1    17.9     1.0   281.3     ONO
02/08/2011 00:00   4.2    15.7    20.6     1.5    28.3     NNE
03/08/2011 00:00   4.6    23.3    25.6     2.9    49.2     ENE
04/08/2011 00:00   4.8    17.9    23.0     2.0    30.5     NNE
05/08/2011 00:00   6.0    22.5    26.3     4.4    68.7     ENE
06/08/2011 00:00   4.9    23.8    23.0     3.3    57.3     ENE
07/08/2011 00:00   3.4    12.9    20.2     1.6   104.0     ESE
08/08/2011 00:00   4.0    20.5    22.4     2.6    79.1     ENE
09/08/2011 00:00   4.1    22.4    25.8     2.9    74.1     ENE
10/08/2011 00:00   4.6    18.4    24.0     2.3    52.1     ENE
11/08/2011 00:00   5.0    22.3    27.8     3.3    65.0     ENE
12/08/2011 00:00   5.4    24.9    25.6     4.1    78.7     ENE
13/08/2011 00:00   5.3    26.0    31.7     4.5    79.7     ENE
14/08/2011 00:00   5.9    31.7    29.2     4.5    59.5     ENE 
15/08/2011 00:00   6.3    23.0    25.1     4.6    70.8     ENE
16/08/2011 00:00   6.3    19.5    30.8     4.8    64.0     ENE
17/08/2011 00:00   5.2    21.2    25.3     3.9    57.5     ENE
18/08/2011 00:00   5.0    22.3    23.7     2.6    59.4     ENE
19/08/2011 00:00   4.4    21.6    27.5     2.4    57.0     ENE

The above output shows only part of the file,how I can parse all files and add them to a dataframe 上面的输出只显示文件的一部分,我如何解析所有文件并将它们添加到数据帧

First off, it appears you have a few different datasets in these files. 首先,看起来这些文件中有几个不同的数据集。 You may want them all in one dataframe, but for now, I am going to assume you want them separated. 您可能希望它们都在一个数据框中,但是现在,我将假设您希望它们分开。 Ex (All of the wind*.xls files in one dataframe and all of the stat*.xls files in another.) You could parse the data using read_excel and then concatenate the results using the timestamp as the index as follows: Ex(一个数据帧中的所有wind * .xls文件和另一个数据帧中的所有stat * .xls文件。)您可以使用read_excel解析数据,然后使用timestamp作为索引连接结果,如下所示:

import numpy as np
import pandas as pd, datetime as dt
import glob, os

runDir = "Path to files"

if os.getcwd() != runDir:
    os.chdir(runDir)

files = glob.glob("wind*.xls")

df = pd.DataFrame()

for each in files:
    sheets = pd.ExcelFile(each).sheet_names

    for sheet in sheets:
        df = df.append(pd.read_excel(each, sheet, index_col='Fecha'))

You now have a time-indexed dataframe! 您现在有一个时间索引的数据帧! If you really want to have all of the data in one dataframe (from all of the file types), you can just adjust the glob to include all of the files using something like glob.glob('*.xls') . 如果你真的想要将所有数据放在一个数据帧中(来自所有文件类型),你可以调整glob以包含所有使用glob.glob('*.xls') I would warn from personal experience that it may be easier for you to read in each type of data separately and then merge them after you have done some error checking/munging etc. 我会从个人经验中警告,您可能更容易分别读取每种类型的数据,然后在完成一些错误检查/修改等后合并它们。

Below solution is just a minor tweak on @DavidHagan's answer above. 以下解决方案只是对@ DavidHagan上面回答的一个小调整。

This one includes a column to identify the read File No like F0, F1, etc. and sheet no of each file as S0, S1, etc. So that we can know where the rows came from. 这一列包括一个列,用于标识读取文件No,如F0,F1等 ,每个文件的表单号为S0,S1等。这样我们就可以知道行的来源。

import numpy as np
import pandas as pd, datetime as dt
import glob, os
import sys

runDir = r'c:\blah\blah'

if os.getcwd() != runDir:
    os.chdir(runDir)

files = glob.glob(r'*.*xls*')

df = pd.DataFrame()

#fno is 0, 1, 2, ... (for each file)
for fno, each in enumerate(files):

    sheets = pd.ExcelFile(each).sheet_names

    # sno iss 0, 1, 2, ... (for each sheet)
    for sno, sheet in enumerate(sheets):

        FileNo = 'F' + str(fno) #F0, F1, F2, etc.
        SheetNo = 'S' + str(sno) #S0, S1, S2, etc.

        # print FileNo, SheetNo, each, sheet #debug info

        #header = None if you don't want header or take this out.
        #dfxl is dataframe of each xl sheet

        dfxl = pd.read_excel(each, sheet, header=None)

        #add column of FileNo and SheetNo to the dataframe
        dfxl['FileNo'] = FileNo
        dfxl['SheetNo'] = SheetNo

        #now add the current xl sheet to main dataframe
        df = df.append(dfxl)

After doing above.. ie reading multiple XL Files and Sheets into a single dataframe (df)... you can do this.. to get a sample row from each File, Sheet combination.. and the sample wil be available in dataframe (dfs1). 完成上面之后..即将多个XL文件和表格读入单个数据框(df)...你可以这样做..从每个文件,图纸组合中获取一个样本行..并且样本将在数据框中可用( DFS1)。

#get unique FileNo and SheetNo in dft2
dft2 = df.loc[0,['FileNo', 'SheetNo']]

#empty dataframe to collect sample from each of the read file/sheets
dfs1 = pd.DataFrame()

#loop through each sheet and fileno names
for row in dft2.itertuples():   

    #get a sample from each file to view
    dfts = df[(df.FileNo == row[1]) & (df.SheetNo ==row[2])].sample(1)

    #append the 1 sample to dfs1. this will have a sample row
    # from each xl sheet and file
    dfs1 = dfs1.append(dfts, ignore_index = True) 

dfs1.to_clipboard()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 dfs 导出到带有多个工作表和不同工作表名称的 excel 熊猫 - How to export dfs to excel with multiple sheets and different sheet names pandas 熊猫读取具有多个工作表和不同页眉偏移量的Excel工作表 - pandas read excel sheet with multiple sheets and different header offsets Pandas 读取多张 excel 图纸,带有图案名称 - Pandas read multiple excel sheets with a pattern name 如何从 python 读取多张 excel 文件? 我收到错误消息说“pandas”没有属性“excel” - How to read excel file with multiple sheets from python? I got error saying 'pandas' has no attribute 'excel' 如何在 pandas 中打开一个 excel 文件? - How to open an excel file with multiple sheets in pandas? 使用 Python/Pandas 创建具有多张工作表的 Excel 文件 - Create an Excel file with multiple sheets with Python/Pandas 使用pandas在excel中创建多个工作表以循环工作表名称 - Create multiple sheets in excel using pandas to loop through sheet names 将read_excel中的多个Excel工作表循环到Pandas中的串联数据框中 - Looping multiple Excel sheets in read_excel into a concatenated dataframe in Pandas 如何使用pandas read_excel多处理多个excel表? - How to multiprocess multiple excel sheets using pandas read_excel? 如何将熊猫read_excel()用于多张Excel文件? - How to use pandas read_excel() for excel file with multi sheets?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM