简体   繁体   English

正则表达式检查python中多行之间其他两个模式之间的模式是否存在

[英]regex check existence of patterns between two other patterns across multiple lines in python

I am trying to check if certain patterns exists between two other patterns across multiple lines. 我正在尝试检查跨多行的其他两个模式之间是否存在某些模式。 Namely in a SIP SDP I would like to know if 'a=recvonly','a=sendonly' or 'a=inactive' exists between two lines beginning with 'm=' or if there isn't a second 'm=' line then until the end of the string ($). 即在SIP SDP中,我想知道以'm ='开头的两行之间是否存在'a = recvonly','a = sendonly'或'a = inactive'或是否没有第二个'm ='行,直到字符串($)的末尾。 For example between 'm=audio' and 'm=video' or if no other line beginning with 'm=' exists then until the end, which is an empty line at the bottom. 例如,在'm = audio'和'm = video'之间,或者如果不存在以'm ='开头的其他行,则一直到结尾为止,该行是底部的空行。

Example 1 例子1

v=0\r$
o=- 1402066778 5 IN IP4 10.1.1.1\r$
c=IN IP4 10.1.1.1\r$
m=audio 2066 RTP/AVP 0 101\r$
a=rtpmap:0 PCMU/8000\r$
a=rtpmap:101 telephone-event/8000\r$
a=ptime:20\r$
a=inactive\r$
m=video 0 RTP/AVP 109 34\r$
a=inactive\r$
a=rtpmap:109 H264/90000\r$
a=fmtp:109 profile-level-id=42e01f\r$
$

There is a match here! 这里有一场比赛!

Example 2 例子2

v=0\r$
o=- 1402066778 5 IN IP4 10.1.1.1\r$
c=IN IP4 10.1.1.1\r$
m=audio 2066 RTP/AVP 0 101\r$
a=rtpmap:0 PCMU/8000\r$
a=rtpmap:101 telephone-event/8000\r$
a=ptime:20\r$
m=video 0 RTP/AVP 109 34\r$
a=inactive\r$
a=rtpmap:109 H264/90000\r$
a=fmtp:109 profile-level-id=42e01f\r$
$

There is no match here 这里没有比赛

Example 3 例子3

v=0\r$
o=- 1402066778 5 IN IP4 10.1.1.1\r$
c=IN IP4 10.130.93.210\r$
m=audio 2066 RTP/AVP 0 101\r$
a=rtpmap:0 PCMU/8000\r$
a=rtpmap:101 telephone-event/8000\r$
a=ptime:20\r$
a=recvonly\r$
$

There is a match here again 再次有一场比赛

I thought the following should work because '|' 我认为以下内容应该起作用,因为“ |” is not greedy but it still finds the pattern in Example 2 where it should not since that appears below the m=video. 不是贪婪的,但它仍然在示例2中找到了模式,因为它出现在m = video下方,因此它不应该这样。

re1way = re.compile(r'm=audio.*?(a=recvonly|a=sendonly|a=inactive).*?[(^m=).*|(^$)]')

Where is the flaw in my idea please? 请问我的想法有何缺陷?

I'm not quite sure based on your question exactly what the parameters are here. 根据您的问题,我不太确定确切的参数在这里。 But given your examples and note that the end of a string is a possible endpoint, let's assume you want to determine whether one of the three "a=" instances you cite appear between the first "m=" and either "m="/end of string in a single string object (rather than identifying multiple instances in a single string object). 但是,给出示例并注意字符串的结尾是可能的端点,让我们假设您要确定引用的三个“ a =”实例中的一个是否出现在第一个“ m =”与“ m = /”之间单个字符串对象中字符串的结尾(而不是标识单个字符串对象中的多个实例)。

In this case, I might recommend the following utilizing the '|' 在这种情况下,我建议您使用“ |” special character in a two-tiered solution (this is for explanatory purposes but you get the idea). 两层解决方案中的特殊字符(这是出于解释目的,但您可以理解)。 I'm sure you could craft a fairly complicated single-line search with some work, but in terms of readability I think this is easier: 我敢肯定,您可以通过一些工作来进行相当复杂的单行搜索,但是就可读性而言,我认为这更容易:

a = re.search("m=(.*?)(m=|$)", example, re.DOTALL)
if bool(a) is True:
    ares = a.group()
    aresb = re.search("a=(recvonly|sendonly|inactive)", ares)
    if bool(aresb) is True:
        print("Yes, 'a=' substring found! Matching substring: " + aresb.group())
else:
    print("No initial 'm=' found!")

I note that because the standard regular expressions module doesn't support variable length negative lookbehind assertion patterns, trying to use such methods to create a single line to parse for instances where 'm=' appears before the end of the string (eg Example 2) will not work. 我注意到,由于标准正则表达式模块不支持可变长度的负向后看断言模式,因此请尝试使用此类方法创建一行以解析“ m =”出现在字符串末尾的情况(例如,示例2 ) 不管用。 A multiline solution is best in my opinion. 我认为多线解决方案是最好的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM