简体   繁体   中英

Looping over a directory of large local excel files from a remote server, causes repeated access to same files

I am using a vendor supplied jupyter environment hosted over a remote server, the project files are stored locally.

I have a bunch of excel files I read data from and use vendor api to get other fields.

I am running into an issue where if I use os.listdir() to loop, I keep accessing the same files. I feel that the vendor application takes a snapshot of my project directory periodically to sync and if in the meantime I am in midst of accessing data from a large excel file, the file iterator gets reset to the new snapshot and I end up reading the same files over and over.


for file in os.listdir(path):

    print(file)

    full_file_name=os.path.join(path,file)

    try:

        with pd.ExcelFile(full_file_name) as file_read:

            print(file_read)

            ## Code to read data from different tabs


Output:

Portfolio positions 3.xlsx
Portfolio positions 3.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CB10BCF8>
Portfolio positions 3.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CB10BCF8>
Portfolio positions 4.xlsx
Portfolio positions 3.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CB10BCF8>
Portfolio positions 4.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CAF12908>
Portfolio positions 3.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CB10BCF8>
Portfolio positions 4.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CAF12908>
Portfolio positions 5.xlsx
Portfolio positions 3.xlsx
<pandas.io.excel.ExcelFile object at 0x000001C8CB10BCF8>
...
etc

I can't say why you're experiencing this problem, but an easy solution would be to read files into a list first and create a set to only iterate over unique file names.

files = set(os.listdir(path))
for filename in files:
    print(filename)

in the meantime I am in midst of accessing data from a large excel file, the file iterator gets reset to the new snapshot and I end up reading the same files over and over.

I don't think this is what's happening to you, my understanding of Python is that os.listdir() is getting called once. That said, I can't explain the behavior you're seeing, and so I recommend guarding against it anyway.

Try assembling the files into a list and then processing them.

full_file_names = []
for _file in os.listdir(path):
    print(_file)
    full_file_names.append(os.path.join(path, _file))

for full_file in full_file_names:
    try:
        ...

Also, try not to use file as it masks a built-in.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM