[英]Regex: Match all characters in between an underscore and a period
I have a set of file names in which I need to extract their dates.我有一组文件名,我需要在其中提取它们的日期。 The file names look like:文件名如下所示:
['1 120836_1_20210101.csv',
'1 120836_1_20210108.csv',
'1 120836_20210101.csv',
'1 120836_20210108.csv',
'10 120836_1_20210312.csv',
'10 120836_20210312.csv',
'11 120836_1_20210319.csv',
'11 120836_20210319.csv',
'12 120836_1_20210326.csv',
...
]
As an example, I would need to extract 20210101
from the first item in the list above.例如,我需要从上面列表的第一项中提取20210101
。
Here is my code but it is not working - I'm not totally familiar with regex.这是我的代码,但它不起作用 - 我对正则表达式并不完全熟悉。
import re
dates = []
for file in files:
dates.extend(re.findall("(?<=_)\d{}(?=\d*\.)", file))
You weren't that far off, but there were a few issues:你不是那么遥远,但有几个问题:
dates
by the result of the .findall
, but you only expect to find one and are constructing all of dates
, so that would be a lot simpler with a re.search
in a list comprehension您通过.findall
的结果扩展dates
,但您只希望找到一个并且正在构建所有dates
,因此使用列表理解中的re.search
会简单得多This is what you were after:这就是你所追求的:
import re
files = [
'1 120836_1_20210101.csv',
'1 120836_1_20210108.csv',
'1 120836_20210101.csv',
'1 120836_20210108.csv',
'10 120836_1_20210312.csv',
'10 120836_20210312.csv',
'11 120836_1_20210319.csv',
'11 120836_20210319.csv',
'12 120836_1_20210326.csv'
]
dates = [re.search(r"(?<=_)\d+(?=\.)", fn).group(0) for fn in files]
print(dates)
Output:输出:
['20210101', '20210108', '20210101', '20210108', '20210312', '20210312', '20210319', '20210319', '20210326']
It keeps the lookbehind for an underscore, and changes the lookahead to look for a period.它保留下划线的lookbehind,并更改lookahead 以查找一个句点。 It just matches all digits (at least one, with +
) in between the two.它只匹配两者之间的所有数字(至少一个,带有+
)。
Note that the r
in front of the string avoids having to double up the backslashes in the regex, the backslashes in \d
and \.
请注意,字符串前面的r
避免了将正则表达式中的反斜杠、 \d
和\.
are still required to indicate a digit and a literal period.仍然需要指示一个数字和一个文字句点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.