[英]How to read files from two folders and avoid duplicates in Python
I have the following folders that I read SQL files from and save them as variables: 我有以下文件夹供我读取SQL文件并将它们另存为变量:
++folder1
-1.sql
-2.sql
-3.sql
++folder2
-2.sql
The following code does the job well for a single folder. 以下代码很好地完成了单个文件夹的工作。 How I can modify this code to read not just from one folder but from two with a rule that if a file exists in folder2 than don't read the file with the same name from folder1? 我如何修改此代码,以不从一个文件夹而是从两个文件夹中读取,并使用以下规则:如果folder2中存在文件,则不要从folder1中读取具有相同名称的文件?
folder1 = '../folder1/'
for filename in os.listdir(folder1):
path = os.path.join(folder1, filename)
if os.path.isdir(path):
continue
with open(folder1 + filename, 'r') as myfile:
data = myfile.read()
query_name = filename.replace(".sql", "")
exec (query_name + " = data")
You can try something like follows: 您可以尝试如下操作:
folders = ['../folder2/','../folder1/']
checked =[]
for folder in folders:
for filename in os.listdir(folder):
if filename not in checked:
checked.append(filename)
path = os.path.join(folder, filename)
if os.path.isdir(path):
continue
with open(folder + filename, 'r') as myfile:
data = myfile.read()
query_name = filename.replace(".sql", "")
exec (query_name + " = data")
The answer to this is simple: Do two listdir
calls, then skip over the files in folder1 that are also in folder2. 答案很简单:执行两个listdir
调用,然后跳过folder1中的文件,该文件也位于folder2中。
One way to do this is with set operations: the set difference a - b
means all elements in a
that are not also in b
, which is exactly what you want. 要做到这一点的方法之一是设置操作:差集a - b
指中的所有元素a
不属于也在b
,这是你想要什么。
files1 = set(os.listdir(folder1))
files2 = set(os.listdir(folder2))
files1 -= files2
paths1 = [os.path.join(folder1, file) for file in files1]
paths2 = [os.path.join(folder2, file) for file in files2]
for path in paths1 + paths2:
if os.path.isdir(path):
# etc.
As a side note, dynamically creating a bunch of variables like this is almost always a very bad idea, and doing it with exec
instead of globals
or setattr
is an even worse idea. 附带说明一下,动态创建一堆这样的变量几乎总是一个非常糟糕的主意,而使用exec
而不是globals
或setattr
则更糟。 It's usually be much better to store everything in, eg, a dict. 通常,将所有内容存储在例如dict中会更好。 For example: 例如:
queries = {}
for path in paths1 + paths2:
if os.path.isdir(path):
continue
name = os.path.splitext(os.path.basename(path))[0]
with open(path) as f:
queries[name] = f.read()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.