简体   繁体   English

Python中的字符串函数以在两个字符之间提取

[英]string function in Python to extract between two characters

I have the below string and I want to extract everything from <img... to the closing " after .jpg . 我有以下字符串,我想提取一切从<img...到收盘".jpg

I tried the below, but it doesn't find just the first " but rather the very end. 我尝试了以下操作,但它不仅找到第一个" ,而且还找到了最后一个。

Can anyone help? 有人可以帮忙吗?

In [14]: start = 'img src="'
In [15]: end = '"'
print string[string.find(start)+len(start):string.rfind(end)]

STRING: 串:

 <p><a href="https://news.yahoo.com/us-ambassador-takes-post-united-nations-141833297.html"><img src="http://l1.yimg.com/uu/api/res/1.2/1f8jyGM.NfkxLb_.OgMaIQ--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/http://media.zenfs.com/en_us/News/afp.com/f5bbc19135065fcfff40e6ece9650f4ab225fa97.jpg" width="130" height="86" alt="New US ambassador takes up post at United Nations" align="left" title="New US ambassador takes up post at United Nations" border="0" ></a>US Ambassador Kelly Craft took up her post at the United Nations on Thursday, vowing to defend America's values and interests nine months after the departure of her high-profile predecessor Nikki Haley. Craft, 57, served previously as US ambassador to Canada where she was involved in negotiations on a new US Mexico Canada free trade agreement.<p><br clear="all">

You can use Regex like this, if you are sure it would be always same. 如果您确定它总是一样的话,可以使用Regex这样。

<img.*?jpg\\"

Here is the link for it, Regex101 You can tweak as you want though depending upon your requirements. 这是它的链接, Regex101您可以根据需要进行调整。 Regex is the right tool for it instead of sting find and len and all that. 正则表达式是正确的工具,而不是麻烦find和len等等。

You could just use the .split() function, if you don't want to use a reg ex. 如果您不想使用正则表达式,则可以只使用.split()函数。

str = """<p><a href="https://news.yahoo.com/us-ambassador-takes-post-united-nations-141833297.html"><img src="http://l1.yimg.com/uu/api/res/1.2/1f8jyGM.NfkxLb_.OgMaIQ--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/http://media.zenfs.com/en_us/News/afp.com/f5bbc19135065fcfff40e6ece9650f4ab225fa97.jpg" width="130" height="86" alt="New US ambassador takes up post at United Nations" align="left" title="New US ambassador takes up post at United Nations" border="0" ></a>US Ambassador Kelly Craft took up her post at the United Nations on Thursday, vowing to defend America's values and interests nine months after the departure of her high-profile predecessor Nikki Haley. Craft, 57, served previously as US ambassador to Canada where she was involved in negotiations on a new US Mexico Canada free trade agreement.<p><br clear="all">"""


#final should just be the url
final = str.split("img src=\"")[1].split("\" width=")[0]

print(final)

Output: 输出:

http://l1.yimg.com/uu/api/res/1.2/1f8jyGM.NfkxLb_.OgMaIQ--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/http://media.zenfs.com/en_us/News/afp.com/f5bbc19135065fcfff40e6ece9650f4ab225fa97.jpg

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM