简体   繁体   English

正则表达式向后提取字符串

[英]Regex lookbehind to extract string

So I have this ugly string which I'm picking up off the wire: 所以我有这个丑陋的弦,我要摘下电线:

{"feedtype": "playlist", "base_url": " http://feeds.xhis.com/rteavgen/player/ ", "feed_title": "Single Item Playlist", "feedid": "playlist", "alt_url": " http://www.xhis.com/player/#v=10322367 ", "platform": "iptv", "current_date": "2014-11-14T12:24:39.84167", "full_url": " http://feeds.xhis.com/rteavgen/player/playlist?type=iptv&showId=10343367 ", "shows": [{"itemid": 10332367, "showid": 11544367, "valid_start": "2014-11-13T21:37:39", "ispodcast": 0, "programmeid": 1, "BRINumber": "ih011305791", "duration": 2053247, "id": 10323367, "media:group": [{"rte:server": " http://vod.hds.xhis.com/hds-vod ", "medium": "video", "url": "/2014/1113/20141113-dumbydoozle_cl10344367_10344406_260_/manifest.f4m", "type": "video/mp4", "i {“ feedtype”:“播放列表”,“ base_url”:“ http://feeds.xhis.com/rteavgen/player/ ”,“ feed_title”:“单项播放列表”,“ feedid”:“播放列表”,“ alt_url “:” http://www.xhis.com/player/#v=10322367 “,”平台“:” iptv“,”当前日期“:” 2014-11-14T12:24:39.84167“,”完整网址“:” http://feeds.xhis.com/rteavgen/player/playlist?type=iptv&showId=10343367 “,”演出“:[{” itemid“:10332367,” showid“:11544367,” valid_start“:” 2014-11- 13T21:37:39”,“ ispodcast”:0,“ programmeid”:1,“ BRINumber”:“ ih011305791”,“ duration”:2053247,“ id”:10323367,“ media:group”:[{“ rte:服务器”:“ http://vod.hds.xhis.com/hds-vod ”,“中”:“视频”,“网址”:“ / 2014/1113 / 20141113-dumbydoozle_cl10344367_10344406_260_ / manifest.f4m”,“类型”:“ video / mp4”,“ i

It's sorta JSONy - the string I get isn't always guaranteed to be complete, so I can't parse it. 这有点像JSONy-不能总是保证我得到的字符串是完整的,因此我无法解析它。 Also, the protocol could change. 同样,协议可能会更改。

Anyway, I'm trying to do this: 无论如何,我正在尝试这样做:

  • find "manifest.f4m" 找到“ manifest.f4m”
  • extract the string: "/2014/1113/20141113-dumbydoozle_cl10344367_10344406_260_/manifest.f4m" 提取字符串:“ / 2014/1113 / 20141113-dumbydoozle_cl10344367_10344406_260_ / manifest.f4m”

Once I have the location of manifest.f4m, I'm done. 找到manifest.f4m的位置后,就完成了。


So I'm trying to formulate a regex to do this reliably, but I'm having terrible trouble... 所以我正试图制定一个正则表达式来可靠地做到这一点,但是我遇到了麻烦。

Here's my regex sofar: 这是我的正则表达式沙发:

/(?<=\/)manifest.f4m(?=("|\s))/

It matches "manifest.f4m" (with either a " or a space after it). 它与“ manifest.f4m”匹配(后跟一个“或”)。

I'm a bit stuck with the lookbehind - I want to look back to the first "/" and extract the entire string that is pointed to by "url". 我有点后顾之忧-我想回到第一个“ /”并提取“ url”指向的整个字符串。

Though maybe there's a much better way of doing all this? 尽管也许有更好的方法来完成所有这些工作?

So I came up with this regex: 所以我想出了这个正则表达式:

[-A-Za-z0-9+&@#\/%?=~_|!:,.;]+[-A-Za-z0-9+&@#\/%=~_|]manifest\.f4m(?=("|\s))

It seems to work rather well. 看起来效果不错。

http://regex101.com/r/iT7vG2/2 http://regex101.com/r/iT7vG2/2

您能否仅从url:部分开始并使用非捕获组,我想至少它会出现。我对照您的示例对其进行了测试,并且似乎可以正常工作

\b(?:url.+)(/.+manifest\.f4m)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM