简体   繁体   English

批处理文件重命名:使用正则表达式的填充时间为零?

[英]Batch file rename: zero padding time with regex?

I have a whole set of files (10.000+) that include the date and time in the filename. 我有一整套文件(超过10.000个),这些文件名中包含日期和时间。 The problem is that the date and time are not zero padded, causing problems with sorting. 问题是日期和时间不是零填充的,从而导致排序问题。

The filenames are in the format: output 5-11-2018 9h0m.xml 文件名的格式为: output 5-11-2018 9h0m.xml
What I would like is it to be in the format: output 05-11-2018 09h00m.xml 我想要的是以下格式: output 05-11-2018 09h00m.xml

I've searched for different solutions, but most seem to use splitting strings and then recombining them. 我搜索了不同的解决方案,但大多数似乎都使用分割字符串然后重新组合它们。 That seems pretty cumbersome, since in my case day, month, hour and minute then need to be seperate, padded and then recombined. 这似乎很麻烦,因为在我的情况下,日,月,小时和分钟需要分开,填充并重新组合。

I thought regex might give me some better solution, but I can't quite figure it out. 我以为正则表达式可以为我提供更好的解决方案,但是我不太清楚。

I've edited my original code based on the suggestion of Wiktor Stribiżew that you can't use regex in the replacement and to use groups instead: 我根据WiktorStribiżew的建议编辑了原始代码,即您不能在替换中使用正则表达式,而应使用组:

import os
import glob
import re

old_format = 'output [1-9]-11-2018 [1-2]?[1-9]h[0-9]m.xml'
dir = r'D:\Gebruikers\<user>\Documents\datatest\'   

old_pattern = re.compile(r'([1-9])-11-2018 ([1-2][1-9])h([0-9])m')

filelist = glob.glob(os.path.join(dir, old_format))
for file in filelist:
    print file
    newfile = re.sub(old_pattern, r'0\1-11-2018 \2h0\3m', file)
    os.rename(file, newfile)

But this still doesn't function completely as I would like, since it wouldn't change hours under 10. What else could I try? 但这仍然不能完全按照我的意愿运行,因为它不会更改小时数在10以下的时间。我还能尝试什么?

You can pad the numbers in your file names with .zfill(2) using a lambda expression passed as the replacement argument to the re.sub method. 您可以使用lambda表达式(作为替换参数传递给re.sub方法.zfill(2).zfill(2)文件名中的数字。

Also, fix the regex pattern to allow 1 or 2 digits: (3[01]|[12][0-9]|0?[1-9]) for a date, (2[0-3]|[10]?\\d) for an hour (24h), and ([0-5]?[0-9]) for minutes: 另外,固定正则表达式模式以允许1或2位数字: (3[01]|[12][0-9]|0?[1-9])表示日期, (2[0-3]|[10]?\\d)一个小时(24h),和([0-5]?[0-9])几分钟:

old_pattern = re.compile(r'\b(3[01]|[12][0-9]|0?[1-9])-11-2018 (2[0-3]|[10]?\d)h([0-5]?[0-9])m')

See the regex demo . 参见regex演示

Then use: 然后使用:

for file in filelist:
    newfile = re.sub(old_pattern, lambda x: '{}-11-2018 {}h{}m'.format(x.group(1).zfill(2), x.group(2).zfill(2), x.group(3).zfill(2)), file)
    os.rename(file, newfile)

See Python re.sub docs: 请参阅Python re.sub docs:

If repl is a function, it is called for every non-overlapping occurrence of pattern . 如果repl是一个函数,则每次pattern的非重叠出现都会调用它。 The function takes a single match object argument, and returns the replacement string. 该函数采用单个match对象参数,并返回替换字符串。

I suggest going more generic with old_pattern for simplicity, assuming your filenames are only misbehaving with digits: 为了简单起见,我建议使用old_pattern进行更通用的处理,假设您的文件名仅与数字不兼容:

Because combinations of filenames matching a single-digit field that needs converting in any position but are double digits in other fields would need a long regex to list out more explicitly, I suggest this much simpler one to match the files to rename, which makes assumptions that there are only this matching type of file in the directory as it opens it up more widely in order to be simpler to write and read at a glance - find any single digit field in the filename (one or more of) - ie. 因为匹配一个位数字段的文件名组合需要在任何位置进行转换,但在其他字段中都是两位数,则需要一个长的正则表达式来更明确地列出,所以我建议使用这种简单得多的文件名来匹配要重命名的文件,这可以做个假设目录中只有这种匹配类型的文件,因为它可以更广泛地打开它,以便一目了然地编写和读取文件-在文件名中找到任何一位数字字段(一个或多个)-即。 non-digit, digit, non-digit: 非数字,数字,非数字:

old_format = r'output\\.*\\D\\d\\D.*\\.xml'

The fixing re.sub statement could then be: 固定的re.sub语句可以是:

newfile = re.sub(r'\\D(\\d)[hm-]', lambda x: x.group()[0]+x.group()[1].zfill(2)+x.group()[2], file)

This would also catch unicode non-ascii digits unless the appropriate re module flags are set. 除非设置了适当的re module标志,否则这还将捕获unicode非ASCII数字。

If the year (2018 in example) might be given as just '18' then it would need special handling for that - could be separate case, and also adding a space into the re.sub regex pattern set (ie [-hm ] ). 如果将年份(例如,2018年)仅指定为“ 18”,则需要对此进行特殊处理-可以是单独的情况,还需要在re.sub regex模式集中添加一个空格(即[-hm ] ) 。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM