[英]Regex to extract a string after a date in Python
Having these two types of string:有这两种类型的字符串:
1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip
1635508858063-1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip
How can I get using regex the 111040
part of the string?如何使用正则表达式字符串的111040
部分? It has always 6 digits.它总是 6 位数字。
My approach is: " Take the 6 digit code after the YYYY_MM_DD_HH_MM_SS_ part ", but any other approach is also welcome.我的方法是:“在 YYYY_MM_DD_HH_MM_SS_ 部分后取 6 位代码”,但也欢迎任何其他方法。
EDIT: The last part _0CM.csv.zip
can be suceptible to change.编辑:最后一部分_0CM.csv.zip
可以更改。
Thanks in advance.提前致谢。
You wanted a regex so here it is:你想要一个正则表达式,所以这里是:
[0-9]{4}(?:_[0-9]{2}){5}_([0-9]{6})
[0-9]{4}
: match the first 4 digits of the year, this is our starting anchor [0-9]{4}
: 匹配年份的前 4 位数字,这是我们的起始锚点(?:_[0-9]{2}){5}
: after that, it follows with 5 two digit numbers (month, day, hour, minute, second) so we can just group them all and ignore them (?:_[0-9]{2}){5}
: 之后,后面跟着 5 个两位数(月、日、小时、分钟、秒),因此我们可以将它们全部分组并忽略它们([0-9]{6})
: get the 6 digits following the previous expression. ([0-9]{6})
:获取前一个表达式后面的 6 位数字。The desired number is in capture group 1 of this regex:所需的数字在此正则表达式的捕获组 1 中:
import re
regex = '[0-9]{4}(?:_[0-9]{2}){5}_([0-9]{6})'
re.search(regex, '1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip').group(1)
How about this pattern?这个图案怎么样? Works if you match each line one-by-line:如果您逐行匹配每一行,则有效:
import re
pattern = re.compile('\d{4}_\d{2}_\d{2}_\d{2}_\d{2}_\d{2}_(\d{6})')
print(pattern.findall("1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip"))
This will return '' if an appropriate match isn't found.如果找不到合适的匹配项,这将返回 ''。
import re
strings = [
"1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip",
"1635508858063-1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip",
'Test'
]
pattern = re.compile('_(\d{6})_')
digits = [pattern.search(string).group(1) if pattern.search(string) else '' for string in strings]
print(digits)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.