简体   繁体   English

如何从两个文件夹中读取文件并避免在Python中重复

[英]How to read files from two folders and avoid duplicates in Python

I have the following folders that I read SQL files from and save them as variables: 我有以下文件夹供我读取SQL文件并将它们另存为变量:

++folder1
  -1.sql
  -2.sql
  -3.sql
++folder2
  -2.sql

The following code does the job well for a single folder. 以下代码很好地完成了单个文件夹的工作。 How I can modify this code to read not just from one folder but from two with a rule that if a file exists in folder2 than don't read the file with the same name from folder1? 我如何修改此代码,以不从一个文件夹而是从两个文件夹中读取,并使用以下规则:如果folder2中存在文件,则不要从folder1中读取具有相同名称的文件?

folder1 = '../folder1/'
for filename in os.listdir(folder1):
    path = os.path.join(folder1, filename)
    if os.path.isdir(path):
        continue
    with open(folder1 + filename, 'r') as myfile:
        data = myfile.read()
    query_name = filename.replace(".sql", "")
    exec (query_name + " = data")

You can try something like follows: 您可以尝试如下操作:

folders = ['../folder2/','../folder1/']
checked =[]
for folder in folders:
    for filename in os.listdir(folder):
        if filename not in checked:
            checked.append(filename)
            path = os.path.join(folder, filename)
            if os.path.isdir(path):
                continue
            with open(folder + filename, 'r') as myfile:
                data = myfile.read()
            query_name = filename.replace(".sql", "")
            exec (query_name + " = data")

The answer to this is simple: Do two listdir calls, then skip over the files in folder1 that are also in folder2. 答案很简单:执行两个listdir调用,然后跳过folder1中的文件,该文件也位于folder2中。

One way to do this is with set operations: the set difference a - b means all elements in a that are not also in b , which is exactly what you want. 要做到这一点的方法之一是设置操作:差集a - b指中的所有元素a不属于也在b ,这是你想要什么。

files1 = set(os.listdir(folder1))
files2 = set(os.listdir(folder2))
files1 -= files2

paths1 = [os.path.join(folder1, file) for file in files1]
paths2 = [os.path.join(folder2, file) for file in files2]
for path in paths1 + paths2:
    if os.path.isdir(path):
        # etc.

As a side note, dynamically creating a bunch of variables like this is almost always a very bad idea, and doing it with exec instead of globals or setattr is an even worse idea. 附带说明一下,动态创建一堆这样的变量几乎总是一个非常糟糕的主意,而使用exec而不是globalssetattr则更糟。 It's usually be much better to store everything in, eg, a dict. 通常,将所有内容存储在例如dict中会更好。 For example: 例如:

queries = {}
for path in paths1 + paths2:
    if os.path.isdir(path):
        continue
    name = os.path.splitext(os.path.basename(path))[0]
    with open(path) as f:
        queries[name] = f.read()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM