简体   繁体   English

从远程服务器遍历大型本地excel文件的目录,导致重复访问相同的文件

[英]Looping over a directory of large local excel files from a remote server, causes repeated access to same files

I am using a vendor supplied jupyter environment hosted over a remote server, the project files are stored locally. 我正在使用供应商提供的Jupyter环境,该环境托管在远程服务器上,项目文件存储在本地。

I have a bunch of excel files I read data from and use vendor api to get other fields. 我有一堆Excel文件,我从这些文件中读取数据并使用供应商api获取其他字段。

I am running into an issue where if I use os.listdir() to loop, I keep accessing the same files. 我遇到一个问题,如果我使用os.listdir()循环,那么我将继续访问相同的文件。 I feel that the vendor application takes a snapshot of my project directory periodically to sync and if in the meantime I am in midst of accessing data from a large excel file, the file iterator gets reset to the new snapshot and I end up reading the same files over and over. 我觉得供应商应用程序会定期获取我的项目目录的快照以进行同步,如果在此期间我正在从大型excel文件访问数据,文件迭代器将重置为新的快照,而我最终会读取相同的快照文件一遍又一遍。


for file in os.listdir(path):

    print(file)

    full_file_name=os.path.join(path,file)

    try:

        with pd.ExcelFile(full_file_name) as file_read:

            print(file_read)

            ## Code to read data from different tabs


Output:

Portfolio positions 3.xlsx
Portfolio positions 3.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CB10BCF8>
Portfolio positions 3.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CB10BCF8>
Portfolio positions 4.xlsx
Portfolio positions 3.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CB10BCF8>
Portfolio positions 4.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CAF12908>
Portfolio positions 3.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CB10BCF8>
Portfolio positions 4.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CAF12908>
Portfolio positions 5.xlsx
Portfolio positions 3.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CB10BCF8>
...
etc

I can't say why you're experiencing this problem, but an easy solution would be to read files into a list first and create a set to only iterate over unique file names. 我不能说为什么会遇到此问题,但是一个简单的解决方案是先将文件读入列表,然后创建一个仅迭代唯一文件名的集合。

files = set(os.listdir(path))
for filename in files:
    print(filename)

in the meantime I am in midst of accessing data from a large excel file, the file iterator gets reset to the new snapshot and I end up reading the same files over and over. 在此期间,我正在从一个较大的excel文件访问数据,文件迭代器被重置为新快照,最终我一次又一次地读取相同的文件。

I don't think this is what's happening to you, my understanding of Python is that os.listdir() is getting called once. 我不认为这是发生在您身上的事情,我对Python的理解是os.listdir()被调用了一次。 That said, I can't explain the behavior you're seeing, and so I recommend guarding against it anyway. 就是说,我无法解释您所看到的行为,因此我建议无论如何都要提防这种行为。

Try assembling the files into a list and then processing them. 尝试将文件组装到列表中,然后进行处理。

full_file_names = []
for _file in os.listdir(path):
    print(_file)
    full_file_names.append(os.path.join(path, _file))

for full_file in full_file_names:
    try:
        ...

Also, try not to use file as it masks a built-in. 另外,请尽量不要使用file因为它会掩盖内置file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从大型目录Dropbox API访问文件 - Access files from a large directory Dropbox API 如何从远程服务器检索文件目录? - How to retrieve a directory of files from a remote server? 如何将文件从远程Windows服务器获取到本地Windows计算机目录? - How to get files from Remote windows server to local Windows machine directory? 使用 python 对目录中的文件执行循环命令 - Looping command over files in directory with python 是否可以从本地计算机访问(而不是复制)远程文件? - Is it possibile to access ( not copy ) files in my remote from my local machine? 使用Pandas从远程服务器读取Excel文件 - Read excel files from a remote server using pandas 使用 Python 将本地服务器上一个目录中的多个文件传输到远程服务器上的不同目录 - Transfer multiple files in a directory on a local server to different directories on remote server in Python 遍历目录中的 excel 文件:Openpyxl - Iterating over excel files in a directory: Openpyxl 使用python将文件从服务器复制到本地目录 - To copy the files from a server to a local directory using python NotImplementedError:尝试循环遍历目录中的所有 .html 文件时不支持非相关模式 - NotImplementedError: Non-relative patterns are unsupported from attempted looping over all .html files in a directory
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM