简体   繁体   English

删除每10位数字

[英]Remove every 10 digit number

I have a huge collection of files that I am trying to rename in bulk. 我有大量的文件,试图批量重命名。 The patterns of these filenames are somewhat consistent but there are few bumps that render my basic regex knowledge inadequate. 这些文件名的模式在某种程度上是一致的,但是几乎没有什么障碍使我的基本正则表达式知识不足。

The filenames usually go like this: 1050327473 {913EDD51} 1st Filename [2nd Edition].txt 文件名通常是这样的: 1050327473 {913EDD51} 1st Filename [2nd Edition].txt

I could remove the strings between {} , [] , and few other special characters with this piece of code: 我可以用这段代码删除{}[]和其他一些特殊字符之间的字符串:

new_file_name = re.sub(r'{.+?}', '', filename)
new_file_name = re.sub(r'\[.+?]', '', new_file_name)
new_file_name = ((new_file_name.split(" .pdf", 1)[0]) + '.pdf').translate({ord(i):None for i in '/\:*?"<>|_'})

and it successfully outputs this: 并成功输出以下内容:

1050327473 1st Filename

However some of the original filenames are different than the pattern and I still have to remove the 10 digit number. 但是,某些原始文件名与模式不同,我仍然必须删除10位数字。 Few of the other patterns are like this: 其他模式很少是这样的:

785723041X, 4844004976 {2C5ACB07} 1st Filename.txt
0383948600 {6A7528B5} 2nd Filename.txt
3263031418, 7966530910, 8070331430 {DCBAD13B} 3rd Filename.txt

The expect output is 预期输出为

1st Filename.txt
2nd Filename.txt
3rd Filename.txt

Now, I could remove every bit of number characters but the file name would also lose a meaningful part of it and become st Filename.txt . 现在,我可以删除所有数字字符,但是文件名也将丢失一部分有意义的部分,并成为st Filename.txt Taking a certain part of the string array with something like [10:] would also not work because the length of this digit is interchangeable. [10:]类的值来获取字符串数组的某个部分也将不起作用,因为该数字的长度是可互换的。

I thought the most logical thing would be to remove every 10 digit character but some of the 10 digit number sequences end with an X instead of the 10th digit, like 785723041X . 我认为最合乎逻辑的事情是删除每10位数字字符,但是10位数字序列中的某些数字以X而不是10位数字结尾,例如785723041X Also, if the 10 digit sequence is followed by a comma that should be removed too. 同样,如果10位数序列后跟逗号,则也应将其删除。

What would be the best approach to solve this problem? 解决这个问题的最佳方法是什么? Is it doable with regex only? 只能使用正则表达式吗?

With specific regex pattern: 使用特定的正则表达式模式:

import re

filenames = ['785723041X, 4844004976 {2C5ACB07} 1st Filename.txt',
             '0383948600 {6A7528B5} 2nd Filename.txt',
             '3263031418, 7966530910, 8070331430 {DCBAD13B} 3rd Filename.txt']

pat = re.compile(r'\{[^{}]+\}|\[[^[]]+\]|\b\d{9}[\dX],?')
filenames = [pat.sub('', f).strip() for f in filenames]
print(filenames)

The output: 输出:

['1st Filename.txt', '2nd Filename.txt', '3rd Filename.txt']

Regex details: 正则表达式详细信息:

  • ..|..|.. - alternation group (to match a single regular expression out of several possible regular expressions) ..|..|..交替组(匹配多个可能的正则表达式中的一个正则表达式)
  • \\{[^{}]+\\} - match any characters enclosed with {} (except themselves, ensured by character class [^{}]+ ) \\{[^{}]+\\} -匹配用{}括起来的所有字符(它们本身除外,由字符类[^{}]+
  • \\[[^[]]+\\] - match any characters enclosed with [] (except themselves, ensured by character class [^[]]+ ) \\[[^[]]+\\] -匹配用[]括起来的任何字符(它们本身除外,由字符类[^[]]+确保)
  • \\b\\d{9}[\\dX],? - match 9-digit sequence followed either by 10th digit or X char and optional trailing , char -匹配9位数字序列,后跟10位数字或X字符,以及可选的结尾,字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM