简体   繁体   English

创建超过 100000 个项目的 treeview 时代码变慢

[英]Code slow when creating treeview of 100000+ items

Below is a snippet of code I had been working on a few months back but only now is it needed.下面是我几个月前一直在研究的一段代码,但现在才需要它。 I believe the main part of it is some code I ammended from a SO post but I lost the URL.我相信它的主要部分是我从 SO 帖子中修改的一些代码,但我丢失了 URL。 Eitherway, I had forgotten how slow it is when hundreds of thousands of files are involved so I am looking into methods of making it faster.不管怎样,我已经忘记了当涉及数十万个文件时它有多慢,所以我正在寻找让它更快的方法。

I've tried moving parts of the code around and ommitting certain sections, but performance either stays the same of gets worse which leads me to believe the issue is in the os.listdir command.我尝试移动部分代码并省略某些部分,但性能要么保持不变,要么变得更糟,这让我相信问题出在 os.listdir 命令中。 From what I have read os.listdir is the fastest option here as it doesn't perform as many system calls as scandir or walk, but its performance is still sad with folders exceeding 100000 files as referenced below.从我读过的内容来看,os.listdir 是这里最快的选项,因为它执行的系统调用不如 scandir 或 walk 那么多,但它的性能仍然很糟糕,因为文件夹超过 100000 个文件,如下所述。

14387 files in 2794 folders processed in 5.88s
14387 files in 2794 folders processed in 3.224s
14387 files in 2794 folders processed in 5.847s


110016 files in 21440 folders processed in 22.732s
110016 files in 21440 folders processed in 22.603s
110016 files in 21440 folders processed in 41.055s


249714 files in 35707 folders processed in 66.452s
249714 files in 35707 folders processed in 49.154s
249714 files in 35707 folders processed in 88.43s
249714 files in 35707 folders processed in 48.942s

I am currently looking into another way of indexing the file/folder locations using a static text file that would be prepopulated on the server every hour with the latest folder contents, but before I give up on the below code, I thought to ask for assistance as to whether the code can be made faster or is it operating at its limit.我目前正在研究另一种使用 static 文本文件索引文件/文件夹位置的方法,该文本文件每小时会在服务器上预先填充最新的文件夹内容,但在我放弃下面的代码之前,我想寻求帮助至于代码是否可以更快,或者它是否在其极限运行。

import tkinter as tk
import tkinter.ttk as ttk
from ttkwidgets import CheckboxTreeview
import os
import time

time_start = time.time()

iid = 1  # IID of tree item. 0 is top level parent
count_folders = 0  # Number of folders in parent
count_files = 0  # Number of files in parent
compare_check = {}  # Build the dictionary with IID key and folder/file paths in list

root = tk.Tk()
root.geometry('850x450')

style = ttk.Style(root)

v_scrollbar = tk.Scrollbar(root, orient='vertical')
v_scrollbar.place(x=830, y=20, width=20, height=415)
tree = CheckboxTreeview(root, show='tree', yscrollcommand=v_scrollbar.set)
tree.place(x=10, y=20, anchor="nw", width=815, height=415)
v_scrollbar.config(command=tree.yview)
style.configure('Treeview', indent=15)


def new_folder(parent_path, directory_entries, parent_iid):
    global iid, count_folders, count_files
    for name in directory_entries:
        item_path = parent_path + os.sep + name
        if os.path.isdir(item_path):
            subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
            try:
                subdir_entries = os.listdir(item_path)
                new_folder(parent_path=item_path, directory_entries=subdir_entries, parent_iid=subdir_iid)
                count_folders += 1  # for testing
            except PermissionError:
                pass
        else:
            tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
            count_files += 1  # for testing

        # The iid of the tree item is returned as hex value
        iid += 1
        hex_iid = hex(iid)
        hex_of_folder_file = str(hex_iid)[2:].upper()  # Omit the 0x of the hex value
        hex_compare = hex_of_folder_file

        # For the external app searching function we need to prefix the given iid hex value with an 'I'
        if len(hex_compare) >= 3:
            hex_compare = 'I' + str(hex_of_folder_file)
        elif len(hex_compare) == 2:
            hex_compare = 'I0' + str(hex_of_folder_file)
        elif len(hex_compare) == 1:
            hex_compare = 'I00' + str(hex_of_folder_file)

        iid = int(hex_iid, 16)  # Convert back to decimal to continue the iid increment count

        compare_check.update({hex_compare: [parent_path, parent_path[14:], name]})  # Update dictionary with current item


parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
start_path = os.path.expanduser(r"K:/DMC Processed - 02072017")  # Path for test
start_dir_entries = os.listdir(start_path)
new_folder(parent_path=start_path, directory_entries=start_dir_entries, parent_iid=parent_iid)

time_end = time.time()
time_total = round(time_end - time_start, 3)  # for testing. Simple start to end timer result

ttk.Label(root, text=f"Files: {count_files} || Folders: {count_folders} || Time: {time_total}s", font='arial 10 bold').place(x=300, y=0)  # for testing

print(f"{count_files} files in {count_folders} folders processed in {time_total}s")  # for testing

root.mainloop()

Since you nicely set it up with timing I thought it'd be fun challenge to give this a try.既然你很好地设置了时间,我认为尝试一下会是一个有趣的挑战。

I tried rewriting it to use os.walk, but I had a thought that your os.path.isdir() call would be incredibly slow, so I switched that out with scandir .我尝试重写它以使用 os.walk,但我认为你的os.path.isdir()调用会非常慢,所以我用scandir把它换掉了。 Turns out that's the fastest way I could find.原来这是我能找到的最快的方法。

Benchmarks:基准:

original: 697665 files in 76729 folders processed in 106.079s
os.scandir: 697665 files in 76729 folders processed in 23.152s
os.walk: 697665 files in 76731 folders processed in 32.869s

Using the scandir module didn't seem to make much difference, seems Python has optimised os quite nicely now.使用scandir模块似乎没有太大区别,似乎 Python 现在已经很好地优化了os

Here's your code with the other functions:这是您的其他功能的代码:

import tkinter as tk
import tkinter.ttk as ttk
from ttkwidgets import CheckboxTreeview
import os
import scandir
import time

time_start = time.time()

iid = 1  # IID of tree item. 0 is top level parent
count_folders = 0  # Number of folders in parent
count_files = 0  # Number of files in parent
compare_check = {}  # Build the dictionary with IID key and folder/file paths in list

root = tk.Tk()
root.geometry('850x450')

style = ttk.Style(root)

v_scrollbar = tk.Scrollbar(root, orient='vertical')
v_scrollbar.place(x=830, y=20, width=20, height=415)
tree = CheckboxTreeview(root, show='tree', yscrollcommand=v_scrollbar.set)
tree.place(x=10, y=20, anchor="nw", width=815, height=415)
v_scrollbar.config(command=tree.yview)
style.configure('Treeview', indent=15)


def new_folder(parent_path, directory_entries, parent_iid):
    global iid, count_folders, count_files
    for name in directory_entries:
        item_path = parent_path + os.sep + name
        if os.path.isdir(item_path):
            subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
            try:
                subdir_entries = os.listdir(item_path)
                new_folder(parent_path=item_path, directory_entries=subdir_entries, parent_iid=subdir_iid)
                count_folders += 1  # for testing
            except PermissionError:
                pass
        else:
            tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
            count_files += 1  # for testing

        # The iid of the tree item is returned as hex value
        iid += 1
        hex_iid = hex(iid)
        hex_of_folder_file = str(hex_iid)[2:].upper()  # Omit the 0x of the hex value
        hex_compare = hex_of_folder_file

        # For the external app searching function we need to prefix the given iid hex value with an 'I'
        if len(hex_compare) >= 3:
            hex_compare = 'I' + str(hex_of_folder_file)
        elif len(hex_compare) == 2:
            hex_compare = 'I0' + str(hex_of_folder_file)
        elif len(hex_compare) == 1:
            hex_compare = 'I00' + str(hex_of_folder_file)

        iid = int(hex_iid, 16)  # Convert back to decimal to continue the iid increment count

        compare_check.update({hex_compare: [parent_path, parent_path[14:], name]})  # Update dictionary with current item


def new_folder_scandir(parent_path, parent_iid):
    global iid, count_folders, count_files
    for name in os.scandir(parent_path):
        if name.is_dir():
            subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
            try:
                new_folder_scandir(parent_path=name.path, parent_iid=subdir_iid)
                count_folders += 1  # for testing
            except PermissionError:
                pass
        else:
            tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
            count_files += 1  # for testing

        # The iid of the tree item is returned as hex value
        iid += 1
        hex_iid = hex(iid)
        hex_of_folder_file = str(hex_iid)[2:].upper()  # Omit the 0x of the hex value
        hex_compare = hex_of_folder_file

        # For the external app searching function we need to prefix the given iid hex value with an 'I'
        if len(hex_compare) >= 3:
            hex_compare = 'I' + str(hex_of_folder_file)
        elif len(hex_compare) == 2:
            hex_compare = 'I0' + str(hex_of_folder_file)
        elif len(hex_compare) == 1:
            hex_compare = 'I00' + str(hex_of_folder_file)

        iid = int(hex_iid, 16)  # Convert back to decimal to continue the iid increment count

        compare_check.update({hex_compare: [parent_path, parent_path[14:], name]})  # Update dictionary with current item



def new_folder_walk(path):
    global count_folders, count_files

    def hex_thing(parent_path, name):
        global iid

        # The iid of the tree item is returned as hex value
        iid += 1
        hex_iid = hex(iid)
        hex_of_folder_file = str(hex_iid)[2:].upper()  # Omit the 0x of the hex value
        hex_compare = hex_of_folder_file

        # For the external app searching function we need to prefix the given iid hex value with an 'I'
        if len(hex_compare) >= 3:
            hex_compare = 'I' + str(hex_of_folder_file)
        elif len(hex_compare) == 2:
            hex_compare = 'I0' + str(hex_of_folder_file)
        elif len(hex_compare) == 1:
            hex_compare = 'I00' + str(hex_of_folder_file)

        iid = int(hex_iid, 16)  # Convert back to decimal to continue the iid increment count

        compare_check.update({hex_compare: [parent_path, parent_path[14:], name]})  # Update dictionary with current item

    tree_items = {path: tree.insert(parent='', index='0', text='All Documents', open=True)}
    for root, dirs, files in scandir.walk(path):
        for dir in dirs:
            path = os.path.join(root, dir)
            count_folders += 1
            tree_items[path] = tree.insert(parent=tree_items[root], index='end', text=f'[F] {dir}')
            hex_thing(root, dir)

        for file in files:
            path = os.path.join(root, file)
            count_files += 1
            tree.insert(parent=tree_items[root], index='end', text=f'[f] {file}')
            hex_thing(root, file)


start_path = os.path.expanduser(r"C:/Program Files")  # Path for test

# 0 = original, 1 = scandir, 2 = walk
run = 1

if run == 0:
    parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
    start_dir_entries = os.listdir(start_path)
    new_folder(parent_path=start_path, directory_entries=start_dir_entries, parent_iid=parent_iid)
elif run == 1:
    parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
    new_folder_scandir(parent_path=start_path, parent_iid=parent_iid)
elif run == 2:
    new_folder_walk(start_path)

time_end = time.time()
time_total = round(time_end - time_start, 3)  # for testing. Simple start to end timer result

ttk.Label(root, text=f"Files: {count_files} || Folders: {count_folders} || Time: {time_total}s", font='arial 10 bold').place(x=300, y=0)  # for testing

print(f"{count_files} files in {count_folders} folders processed in {time_total}s")  # for testing

root.mainloop()

For the record I'm actually surprised that os.walk is slower than os.scandir even when iterating through every file.作为记录,我真的很惊讶os.walkos.scandir慢,即使在遍历每个文件时也是如此。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM