简体   繁体   English

使用olefile方法从目录打开excel文件

[英]Using olefile method to open excel files from a directory

good day,再会,

I am attempting to open multiple excel files ( xls ) files and put them in one data frame.我正在尝试打开多个 excel 文件( xls )文件并将它们放在一个数据帧中。 I am using .glob() to access the files here:我正在使用.glob()访问此处的文件:

all_files = glob.glob('D:\Anaconda Hub\ARK analysis\Ark analysis\data\year2021\\february\\**.xls')

The sample output is a list so:样本 output 是一个列表,因此:

['D:\\Anaconda Hub\\ARK analysis\\Ark analysis\\data\\year2021\\february\\ARK_Trade_02012021_0619PM_EST_601875e069e08.xls',
 'D:\\Anaconda Hub\\ARK analysis\\Ark analysis\\data\\year2021\\february\\ARK_Trade_02022021_0645PM_EST_6019df308ae5e.xls',
 'D:\\Anaconda Hub\\ARK analysis\\Ark analysis\\data\\year2021\\february\\ARK_Trade_02032021_0829PM_EST_601b2da2185c6.xls',
 'D:\\Anaconda Hub\\ARK analysis\\Ark analysis\\data\\year2021\\february\\ARK_Trade_02042021_0637PM_EST_601c72b88257f.xls',
 'D:\\Anaconda Hub\\ARK analysis\\Ark analysis\\data\\year2021\\february\\ARK_Trade_02052021_0646PM_EST_601dd4dc308c5.xls',
 'D:\\Anaconda Hub\\ARK analysis\\Ark analysis\\data\\year2021\\february\\ARK_Trade_02082021_0629PM_EST_6021c739595b0.xls'..]

I am using the olefile method.我正在使用olefile方法。 Here is my code:这是我的代码:

import os
import glob
import olefile as ol
import pandas as pd

 # using olefile to iterate to extract each excel file to be readible 
with open(all_files,'r') as file:
    if file.endswith('.xls'):
        ole = ol.OleFileIO(file)
        if ole.exists('Workbook'):
            d = ole.openstream('Workbook')
            df = pd.read_excel(d, engine='xlrd', header=3, skiprows=3)
            print(df.head())

However, I get this error:但是,我收到此错误:

TypeError: expected str, bytes or os.PathLike object, not list

I am not understanding why I am obtaining this error.我不明白为什么我会收到此错误。 I am iterating over the list to select a string and pass it through the rest of the steps... Help would be appreciated to do this correctly and get the excel files to output in a single data frame. I am iterating over the list to select a string and pass it through the rest of the steps... Help would be appreciated to do this correctly and get the excel files to output in a single data frame. Thanks in advance提前致谢

I believe you are working with a legacy format/version of Microsoft Excel?我相信您正在使用 Microsoft Excel 的旧版格式/版本?

The error message TypeError: expected str, bytes or os.PathLike object, not list is pretty informative in this case.错误消息TypeError: expected str, bytes or os.PathLike object, not list在这种情况下非常有用。 Your code has the line: with open(all_files,'r') as file: , where you have passed the entire list to open() .您的代码有一行: with open(all_files,'r') as file: ,您已将整个列表传递给open()

Try the following code:试试下面的代码:

import os
import glob
import olefile
import pandas as pd

all_files = glob.glob('excelfiles/*.xls')

for file in all_files:
    with olefile.OleFileIO(file) as ole:     # Since olefile v0.46
        if ole.exists('Workbook'):
            d = ole.openstream('Workbook')
            df = pd.read_excel(d, engine='xlrd', header=3, skiprows=3)
            print(df.head())

Output I have from the files shared: Output 我从共享的文件中获得:

   ARKG  2021-02-01  Sell  ... PACIFIC BIOSCIENCES OF CALIFORNIA INC  210508  0.0645
0  ARKK  2021-02-01   Buy  ...                 FATE THERAPEUTICS INC  154509  0.0608
1  ARKK  2021-02-01   Buy  ...                            PACCAR INC  263029  0.1024
2  ARKK  2021-02-01   Buy  ...                          TERADYNE INC  295371  0.1465
3  ARKK  2021-02-01   Buy  ...                 BEAM THERAPEUTICS INC   58218  0.0241
4  ARKK  2021-02-01  Sell  ...         REGENERON PHARMACEUTICALS INC    5130  0.0111

[5 rows x 8 columns]
   ARKG  2021-02-03  Sell  ...  TWIST BIOSCIENCE CORP   97415  0.1615
0  ARKK  2021-02-03   Buy  ...  SPOTIFY TECHNOLOGY SA  385932  0.4980
1  ARKK  2021-02-03   Buy  ...             PACCAR INC  318474  0.1231
2  ARKK  2021-02-03   Buy  ...  FATE THERAPEUTICS INC   98059  0.0394
3  ARKK  2021-02-03   Buy  ...           TERADYNE INC  104809  0.0524
4  ARKK  2021-02-03  Sell  ...               ROKU INC   53551  0.0924

[5 rows x 8 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM