繁体   English   中英

如何从工作目录中收集所有xlsx文件,但未打开的文件除外

[英]how to gather all xlsx files from the working directory, except the ones that are open

import re

my_path = os.getcwd()
files = [f for f in listdir(my_path) if isfile(join(my_path, f))]
pattern = re.compile('xlsx$') # xlsx files
pattern_not = re.compile('^~') # the ones that are open start with ~
files = [x for x in files if (pattern.search(x) and (not pattern_not.search(x)))]

我写这一段代码,它聚集在我的工作目录中的所有文件,然后过滤xlsx文件,而不是那些是开放的。

我的问题是,有什么办法可以写得更加干净/紧凑,因此无需指定两个不同的pattern s,因此在我的情况下patternpattern_not

您的正则表达式解决方案不起作用-您需要根据给定的锁定文件名获取原始文件名以排除它们。 您仅从目录中所有xlsx文件中排除锁定文件。

这可能是朝正确方向迈出的第一步-尽管要仔细检查最后一个有问题的步骤-您必须以某种方式解决该问题:

# exel/word/powerpoint create a lock-file with by prepending ~$ to a filename that you open. 
# the complete lock-file name is different for different lengths of original file lengths.
# Depending on the original name you get 
#   ~$name.xlsx    from   name.xlsx
#   ~$1name.xlsx   from   1name.xlsx
#   ~$12name.xlsx  from   12name.xlsx
#   ~$23name.xlsx  from   123name.xlsx
#   ~$34name.xlsx  from   1234name.xlsx

import re

# file lists all *.xlsx NOT starting with ~$
file = ["test.xlsx", "1test.xlsx", "12test.xlsx", "123test.xlsx", "1234test.xlsx"]
# these are only the lockfiles starting with ~$
lock = ["~$1test.xlsx", "~$12test.xlsx", "~$23test.xlsx", "~$34test.xlsx","~$test.xlsx"]

for lockFile in lock:
    lockBase = lockFile[2:]  # remove the ~$
    nonOpen = [x for x in file if not (x == lockBase or x.endswith(lockBase))]
    isOpen =  [x for x in file if x.endswith(lockBase)]

    print("Locfile:", lockFile)
    print("Is open:", isOpen)
    print("Non open", nonOpen)

输出:

Locfile: ~$1test.xlsx
Is open: ['1test.xlsx']
Non open ['test.xlsx', '12test.xlsx', '123test.xlsx', '1234test.xlsx']

Locfile: ~$12test.xlsx
Is open: ['12test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '123test.xlsx', '1234test.xlsx']

Locfile: ~$23test.xlsx
Is open: ['123test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '12test.xlsx', '1234test.xlsx']

Locfile: ~$34test.xlsx
Is open: ['1234test.xlsx']
Non open ['test.xlsx', '1test.xlsx', '12test.xlsx', '123test.xlsx']

# problematic - all other files end on this pattern, you would have 
# to smarten the testing quite a bit to avoid this:
Locfile: ~$test.xlsx
Is open: ['test.xlsx', '1test.xlsx', '12test.xlsx', '123test.xlsx', '1234test.xlsx']
Non open []   # all end on test.xlsx - thats a problem ...

我用你的模式代替

^[^~]+\.xlsx$

并删除您的pattern_not。 该正则表达式仅应匹配不以〜开头和以.xlsx结尾的文件(但文件中途带有〜时则不匹配)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM