简体   繁体   English

将大量文件与大量正则表达式字符串匹配时的最佳实践

[英]Best Practices when matching large number of files against large number of regex strings

I have a directory with several thousand files. 我有一个包含数千个文件的目录。 I want to sort them into directories based on file name, but many of the file names are very similar. 我想根据文件名将它们分类到目录中,但是许多文件名非常相似。

my thinking is that i'm going to have to write up a bunch of regex strings, and then do some sort of looping. 我的想法是,我将不得不编写一堆正则表达式字符串,然后进行某种循环。 this is my question: 这是我的问题:

is one of these two options more optimal than the other? 这两个选择中的一个比另一个最佳吗? do i loop over all my files, and for each file check it against my regexs, keeping track of how many match? 我要遍历我的所有文件,并针对每个文件对照我的正则表达式进行检查,并跟踪有多少个匹配项吗? or do i do the opposite and loop over the regex and touch each file? 还是我做相反的事情并遍历正则表达式并触摸每个文件?

i had though to do it in python, as thats my strongest language, but i'm open to other ideas. 我不得不用python来做,因为那是我最强的语言,但是我愿意接受其他想法。

this is some code i use for a program of mine which i have modified for your purposes, it gets a directory (sort_dir) goes every every file there, and creates directories based on the filenames, then moves the files into those directories. 这是我用于我的程序的一些代码,我已为您的目的对其进行了修改,它会得到一个目录(sort_dir),该目录会遍历该文件中的每个文件,并根据文件名创建目录,然后将文件移至这些目录中。 since you have not provided any information as to where or how you want to sort your files, you will have to add that part where i have mentioned: 由于您没有提供有关文件的位置或排​​序方式的任何信息,因此您必须在我提到的地方添加该部分:

def sort_files(sort_dir):

    for f in os.listdir(sort_dir):
        if not os.path.isfile(os.path.join(sort_dir, f)):
            continue

        # this is the folder names to be created, what do you want them to be?       
        destinationPath = os.path.join(sort_dir,f) #right now its just the filename...

        if not os.path.exists(destinationPath):
            os.mkdir(destinationPath)

        if os.path.exists(os.path.join(destinationPath,f)):
            at = True
            while at:
                try:
                    shutil.move(os.path.join(sort_dir,f), \
                                os.path.join(destinationPath,f))
                    at = False
                except:
                    continue
        else:
            shutil.move(os.path.join(sort_dir,f), destinationPath)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM