简体   繁体   English

如何使用python regexp从字符串中提取图像名称?

[英]How to extract image name from string using python regexp?

Having a text like this: 有这样的文字:

body = """Some junk texts here.
<img src="/images/15244/somerandomname.jpg" class="news-img">
More texts here"""

I'm wondering how can I extract somerandomname.jpg using python regexp? 我想知道如何使用python regexp提取somerandomname.jpg

What I came up with is this: 我想到的是:

import re
regex = re.findall('/images/(\d+)/(\w+).jpg', body)

But it does return an empty list. 但是它确实返回一个空列表。

re.findall returns either the entire matches if no capturing groups ( (...) ) defined in the pattern, or captured groups if these defined. 如果在模式中未定义捕获组( (...) ),则re.findall返回整个匹配项;如果定义了捕获组,则re.findall返回捕获的组。 Since you've got capturing groups, the latter takes place. 由于您已经捕获了组,因此将进行后者。

Remove capturing groups to get the entire match: 删除捕获组以获取整个匹配项:

regex = re.findall('/images/\d+/\w+.jpg', body)

Demo: https://ideone.com/n1f9R8 演示: https//ideone.com/n1f9R8

you can use 您可以使用

regex = re.findall('/images/(\d+)/([^"]+)', body)
image_src = regex[0][1]

you just need to group only image name part. 您只需要对图像名称部分进行分组。

check this expression it will also work for all other extension ie jpg,png,ttf etc... 选中此表达式,它也适用于所有其他扩展名,例如jpg,png,ttf等...

re.findall('/images/\\d+/(\\w+.\\w{3,4})', body) re.findall('/ images / \\ d + /(\\ w +。\\ w {3,4})',body)
output: ['somerandomname.jpg'] 输出:['somerandomname.jpg']

Your code works, and since you just want to capture the name this will work. 您的代码有效,并且由于您只想捕获名称,因此它将起作用。

import re
body = """Some junk texts here.
<img src="/images/15244/somerandomname.jpg" class="news-img">
More texts here"""
regex = re.findall(r'/images/\d+/(\w+.jpg)', body)
print regex

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM