简体   繁体   English

正则表达式在 Python 中的日期之后提取字符串

[英]Regex to extract a string after a date in Python

Having these two types of string:有这两种类型的字符串:

1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip

1635508858063-1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip

How can I get using regex the 111040 part of the string?如何使用正则表达式字符串的111040部分? It has always 6 digits.它总是 6 位数字。

My approach is: " Take the 6 digit code after the YYYY_MM_DD_HH_MM_SS_ part ", but any other approach is also welcome.我的方法是:“在 YYYY_MM_DD_HH_MM_SS_ 部分后取 6 位代码”,但也欢迎任何其他方法。

EDIT: The last part _0CM.csv.zip can be suceptible to change.编辑:最后一部分_0CM.csv.zip可以更改。

Thanks in advance.提前致谢。

You wanted a regex so here it is:你想要一个正则表达式,所以这里是:

[0-9]{4}(?:_[0-9]{2}){5}_([0-9]{6})
  • [0-9]{4} : match the first 4 digits of the year, this is our starting anchor [0-9]{4} : 匹配年份的前 4 位数字,这是我们的起始锚点
  • (?:_[0-9]{2}){5} : after that, it follows with 5 two digit numbers (month, day, hour, minute, second) so we can just group them all and ignore them (?:_[0-9]{2}){5} : 之后,后面跟着 5 个两位数(月、日、小时、分钟、秒),因此我们可以将它们全部分组并忽略它们
  • ([0-9]{6}) : get the 6 digits following the previous expression. ([0-9]{6}) :获取前一个表达式后面的 6 位数字。

The desired number is in capture group 1 of this regex:所需的数字在此正则表达式的捕获组 1 中:

import re
regex = '[0-9]{4}(?:_[0-9]{2}){5}_([0-9]{6})'
re.search(regex, '1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip').group(1)

How about this pattern?这个图案怎么样? Works if you match each line one-by-line:如果您逐行匹配每一行,则有效:

import re
pattern = re.compile('\d{4}_\d{2}_\d{2}_\d{2}_\d{2}_\d{2}_(\d{6})')
print(pattern.findall("1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip"))

This will return '' if an appropriate match isn't found.如果找不到合适的匹配项,这将返回 ''。

import re

strings = [
    "1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip",
    "1635508858063-1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip",
    'Test'
]

pattern = re.compile('_(\d{6})_')

digits = [pattern.search(string).group(1) if pattern.search(string) else '' for string in strings]

print(digits)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM