简体   繁体   English

正则表达式 python - 在列表中查找在第二个字符“_”到字符“.”之间具有相同数字的匹配项。

[英]Regex python - find match items on list that have the same digit between the second character "_" to character "."

I have the following list:我有以下列表:

list_paths=imgs/foldeer/img_ABC_21389_1.tif.tif,
imgs/foldeer/img_ABC_15431_10.tif.tif,
imgs/foldeer/img_GHC_561321_2.tif.tif,
imgs_foldeer/img_BCL_871125_21.tif.tif,
...

I want to be able to run a for loop to match string with specific number,which is the number between the second occurance of "_" to the ".tif.tif" , for example, when number is 1, the string to be matched is "imgs/foldeer/img_ABC_21389_1.tif.tif", for number 2, the match string will be "imgs/foldeer/img_GHC_561321_2.tif.tif".我希望能够运行一个 for 循环来匹配具有特定数字的字符串,这是"_" 第二次出现到 ".tif.tif" 之间的数字,例如,当数字为 1 时,字符串为匹配的是“imgs/foldeer/img_ABC_21389_1.tif.tif”,对于数字 2,匹配字符串将是“imgs/foldeer/img_GHC_561321_2.tif.tif”。

For that, I wanted to use regex expression.为此,我想使用正则表达式。 Based on this answer, I have tested this regex expression on Regex101: 基于这个答案,我在 Regex101 上测试了这个正则表达式:

[^\r\n_]+\.[^\r\n_]+\_([0-9])

But this doesn't match anything, and also doesn't make sure that it will take the exact number, so if number is 1, it might also select items with number 10.但这不匹配任何东西,也不能确保它会采用准确的数字,所以如果数字是 1,它也可能是 select 项目与数字 10。

My end goal is to be able to match items in the list that have the request number between the 2nd occurrence of "_" to the first occirance of ".tif", using regex expression, looking for help with the regex expression.我的最终目标是能够使用正则表达式匹配列表中请求编号介于第二次出现“_”和第一次出现“.tif”之间的项目,并寻求有关正则表达式的帮助。

EDIT: The output should be the whole path and not only the number.编辑: output 应该是整个路径,而不仅仅是数字。

I'll show you something working and equally ugly as regex which I hate:我将向您展示一些我讨厌的与正则表达式一样有效但同样丑陋的东西:

data = ["imgs/foldeer/img_ABC_21389_1.tif.tif",
"imgs/foldeer/img_ABC_21389_1.tif.tif",
"imgs/foldeer/img_ABC_15431_10.tif.tif",
"imgs/foldeer/img_GHC_561321_2.tif.tif",
"imgs_foldeer/img_BCL_871125_21.tif.tif"]

numbers = [int(x.split("_",3)[-1].split(".")[0]) for x in data]
  • First split gives ".tif.tif"第一次拆分给出“.tif.tif”
  • extract the last element提取最后一个元素
  • split again by the dot this time, take the first element (thats your number as a string), cast it to int这次再次按点拆分,取第一个元素(即您的数字作为字符串),将其转换为 int

Please keep in mind it's gonna work only for the format you provided, no flexibility at all in this solution (on the other hand regex doesn't give any neither)请记住,它仅适用于您提供的格式,此解决方案完全没有灵活性(另一方面,正则表达式也不提供)

without regex if allowed.如果允许,没有正则表达式。

import re
s= 'imgs/foldeer/img_ABC_15431_10.tif.tif'
last =s[s.rindex('_')+1:]
print(re.findall(r'\d+', last)[0])

Gives #给#

10
[0-9]*(?=\.tif\.tif)

This regex expression uses a lookahead to capture the last set of numbers (what you're looking for)此正则表达式使用前瞻来捕获最后一组数字(您要查找的内容)

Try this:试试这个:

import re

s = '''imgs/foldeer/img_ABC_21389_1.tif.tif
imgs/foldeer/img_ABC_15431_10.tif.tif
imgs/foldeer/img_GHC_561321_2.tif.tif
imgs_foldeer/img_BCL_871125_21.tif.tif'''



number = 1
res1 = re.findall(f".*_{number}\.tif.*", s)

number = 21
res21 = re.findall(f".*_{number}\.tif.*", s)


print(res1)
print(res21)

Results结果

['imgs/foldeer/img_ABC_21389_1.tif.tif']
['imgs_foldeer/img_BCL_871125_21.tif.tif']

Your pattern [^\r\n_]+\.[^\r\n_]+\_([0-9]) does not match anything, because you are matching an underscore \_ (note that you don't have to escape it) after matching a dot, and that does not occur in the example data.你的模式[^\r\n_]+\.[^\r\n_]+\_([0-9])不匹配任何东西,因为你匹配下划线\_ (注意你没有在匹配一个点之后转义它),并且在示例数据中没有出现。

Then you want to match a digit, but the available digits only occur before any of the dots.然后你想匹配一个数字,但可用的数字只出现在任何点之前。

In your question, the numbers that you are referring to are after the 3rd occurrence of the _在你的问题中,你所指的数字是在_的第 3 次出现之后


What you could do to get the path(s) is to make the number a variable for the number you want to find:要获取路径,您可以做的是使数字成为您要查找的数字的变量:

^\S*?/(?:[^\s_/]+_){3}\d+\.tif\b[^\s/]*$

Explanation解释

  • \S*? Match optional non whitespace characters, as few as possible尽可能少地匹配可选的非空白字符
  • / Match literally /字面匹配
  • (?:[^\s_/]+_){3} Match 3 times (non consecutive) _ (?:[^\s_/]+_){3}匹配3次(非连续) _
  • \d+ Match 1+ digits \d+匹配 1+ 个数字
  • \.tif\b[^\s/]* Match .tif followed by any char except / \.tif\b[^\s/]*匹配.tif后跟除/之外的任何字符
  • $ End of string $字符串结束

See a regex demo and a Python demo .请参阅正则表达式演示Python 演示

Example using a list comprehension to return all paths for the given number:使用列表理解返回给定数字的所有路径的示例:

import re

number = 10
pattern = rf"^\S*?/(?:[^\s_/]+_){{3}}{number}\.tif\b[^\s/]*$"

list_paths = [
     "imgs/foldeer/img_ABC_21389_1.tif.tif",
     "imgs/foldeer/img_ABC_15431_10.tif.tif",
     "imgs/foldeer/img_GHC_561321_2.tif.tif",
     "imgs_foldeer/img_BCL_871125_21.tif.tif",
     "imgs_foldeer/img_BCL_871125_21.png.tif"
]

res = [lp for lp in list_paths if re.search(pattern, lp)]
print(res)

Output Output

['imgs/foldeer/img_ABC_15431_10.tif.tif']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM