[英]how to gather all xlsx files from the working directory, except the ones that are open
import re
my_path = os.getcwd()
files = [f for f in listdir(my_path) if isfile(join(my_path, f))]
pattern = re.compile('xlsx$') # xlsx files
pattern_not = re.compile('^~') # the ones that are open start with ~
files = [x for x in files if (pattern.search(x) and (not pattern_not.search(x)))]
我写这一段代码,它聚集在我的工作目录中的所有文件,然后过滤xlsx
文件,而不是那些是开放的。
我的问题是,有什么办法可以写得更加干净/紧凑,因此无需指定两个不同的pattern
s,因此在我的情况下pattern
和pattern_not
您的正则表达式解决方案不起作用-您需要根据给定的锁定文件名获取原始文件名以排除它们。 您仅从目录中所有xlsx文件中排除锁定文件。
这可能是朝正确方向迈出的第一步-尽管要仔细检查最后一个有问题的步骤-您必须以某种方式解决该问题:
# exel/word/powerpoint create a lock-file with by prepending ~$ to a filename that you open.
# the complete lock-file name is different for different lengths of original file lengths.
# Depending on the original name you get
# ~$name.xlsx from name.xlsx
# ~$1name.xlsx from 1name.xlsx
# ~$12name.xlsx from 12name.xlsx
# ~$23name.xlsx from 123name.xlsx
# ~$34name.xlsx from 1234name.xlsx
import re
# file lists all *.xlsx NOT starting with ~$
file = ["test.xlsx", "1test.xlsx", "12test.xlsx", "123test.xlsx", "1234test.xlsx"]
# these are only the lockfiles starting with ~$
lock = ["~$1test.xlsx", "~$12test.xlsx", "~$23test.xlsx", "~$34test.xlsx","~$test.xlsx"]
for lockFile in lock:
lockBase = lockFile[2:] # remove the ~$
nonOpen = [x for x in file if not (x == lockBase or x.endswith(lockBase))]
isOpen = [x for x in file if x.endswith(lockBase)]
print("Locfile:", lockFile)
print("Is open:", isOpen)
print("Non open", nonOpen)
输出:
Locfile: ~$1test.xlsx
Is open: ['1test.xlsx']
Non open ['test.xlsx', '12test.xlsx', '123test.xlsx', '1234test.xlsx']
Locfile: ~$12test.xlsx
Is open: ['12test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '123test.xlsx', '1234test.xlsx']
Locfile: ~$23test.xlsx
Is open: ['123test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '12test.xlsx', '1234test.xlsx']
Locfile: ~$34test.xlsx
Is open: ['1234test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '12test.xlsx', '123test.xlsx']
# problematic - all other files end on this pattern, you would have
# to smarten the testing quite a bit to avoid this:
Locfile: ~$test.xlsx
Is open: ['test.xlsx', '1test.xlsx', '12test.xlsx', '123test.xlsx', '1234test.xlsx']
Non open [] # all end on test.xlsx - thats a problem ...
我用你的模式代替
^[^~]+\.xlsx$
并删除您的pattern_not。 该正则表达式仅应匹配不以〜开头和以.xlsx结尾的文件(但文件中途带有〜时则不匹配)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.