简体   繁体   English

如何匹配 Python 中的“多个”字符串

[英]How to match exact “multiple” strings in Python

I want to make a live packet monitoring for SOAP/XML.我想对 SOAP/XML 进行实时数据包监控。 Here is the code.这是代码。

from scapy.all import *   

def pack_callback(packet):

    if packet["TCP"].payload:  
        payload = str(packet["TCP"].payload)  

        Code = '<ResponseCode>(.*?)<|<ResponseRunTime>(.*?)<'

        pat = re.compile(Code) 
        n = pat.findall(payload)
        if n:
            #print n.groups()
            print n

sniff(filter='tcp and port 186 or port 86',prn=pack_callback,iface='vmxnet3 Ethernet Adapter')`

But if I use re.search , I got ('0', None) , when I used re.findall , I got [('0', ''), ('', '1763')]但是如果我使用re.search ,我得到('0', None) ,当我使用re.findall时,我得到[('0', ''), ('', '1763')]

My question is, how can I get ('0', '1763') ?我的问题是,我怎样才能得到('0', '1763') I mean first match <ResponseCode>(.*?)< then match <ResponseRunTime>(.*?)< not search the XML everytime from the beginning.我的意思是首先匹配<ResponseCode>(.*?)<然后匹配<ResponseRunTime>(.*?)<不是每次都从头搜索 XML。

The SOAP response is like following: SOAP 响应如下:

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
   <soap:Body>
      <ns3:RetrieveQuotationResponse xmlns>
         <ResponseVersion>5</ResponseVersion>
         <ResponseCode>0</ResponseCode>
         <ResponseMessage>Correct Petition</ResponseMessage>
         <ResponseRunTime>1887</ResponseRunTime>
         <ResponseData>
            <billingDays>2</billingDays>
            <destinationCurrencyValue>0.0</destinationCurrencyValue>
            <dropOffDate>2018-02-23</dropOffDate>
            <dropOffOfficeId>D2</dropOffOfficeId>
            <dropOffOfficeNameParis</dropOffOfficeName>
            <dropOffTime>09:00</dropOffTime>
            <pickUpDate>2018-02-21</pickUpDate>
            <pickUpOfficeId>D2</pickUpOfficeId>
            <pickUpOfficeName>Paris</pickUpOfficeName>
            <pickUpTime>09:00</pickUpTime>
            <quotationNote>There Are 29 Car Types Availables.</quotationNote>
            <quotationOptions>

And the speed is almost 110 Pakets per second.速度几乎是每秒 110 包。 Thats the reason I want keep the wordround as less as i can, otherwise can Python not that fast to process all the pakets.这就是我希望尽可能少地保留 wordround 的原因,否则 Python 不能那么快地处理所有数据包。

Thanks.谢谢。

In general, attempting to deal with XML using regexes is a futile exercise.通常,尝试使用正则表达式处理 XML 是徒劳的。 While regexes may be able to handle simple tasks, the requirements for XML parsing tend to outgrow regex capabilities, resulting in bugs as well as maintenance & readability problems.虽然正则表达式可能能够处理简单的任务,但 XML 解析的要求往往会超出正则表达式的能力,从而导致错误以及维护和可读性问题。 It's usually better to start with a proper XML parser from the beginning.通常最好从一开始就使用正确的 XML 解析器。

That said, there is a simple way to deal with this particular case.也就是说,有一种简单的方法可以处理这种特殊情况。 findall returns tuples when there are multiple groups, so there should be at most one group in the regex. findall在有多个组时返回元组,因此正则表达式中最多应该有一个组。 It could be accomplished with no groups by using lookarounds , but simpler is to move the alternation to the tag name, rather than the entirety of the match.它可以通过使用lookarounds在没有组的情况下完成,但更简单的是将交替移动到标签名称,而不是整个匹配项。 For example:例如:

<Response(?:Code|RunTime)>([^<]*)<

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM