简体   繁体   中英

How to match exact “multiple” strings in Python

I want to make a live packet monitoring for SOAP/XML. Here is the code.

from scapy.all import *   

def pack_callback(packet):

    if packet["TCP"].payload:  
        payload = str(packet["TCP"].payload)  

        Code = '<ResponseCode>(.*?)<|<ResponseRunTime>(.*?)<'

        pat = re.compile(Code) 
        n = pat.findall(payload)
        if n:
            #print n.groups()
            print n

sniff(filter='tcp and port 186 or port 86',prn=pack_callback,iface='vmxnet3 Ethernet Adapter')`

But if I use re.search , I got ('0', None) , when I used re.findall , I got [('0', ''), ('', '1763')]

My question is, how can I get ('0', '1763') ? I mean first match <ResponseCode>(.*?)< then match <ResponseRunTime>(.*?)< not search the XML everytime from the beginning.

The SOAP response is like following:

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
   <soap:Body>
      <ns3:RetrieveQuotationResponse xmlns>
         <ResponseVersion>5</ResponseVersion>
         <ResponseCode>0</ResponseCode>
         <ResponseMessage>Correct Petition</ResponseMessage>
         <ResponseRunTime>1887</ResponseRunTime>
         <ResponseData>
            <billingDays>2</billingDays>
            <destinationCurrencyValue>0.0</destinationCurrencyValue>
            <dropOffDate>2018-02-23</dropOffDate>
            <dropOffOfficeId>D2</dropOffOfficeId>
            <dropOffOfficeNameParis</dropOffOfficeName>
            <dropOffTime>09:00</dropOffTime>
            <pickUpDate>2018-02-21</pickUpDate>
            <pickUpOfficeId>D2</pickUpOfficeId>
            <pickUpOfficeName>Paris</pickUpOfficeName>
            <pickUpTime>09:00</pickUpTime>
            <quotationNote>There Are 29 Car Types Availables.</quotationNote>
            <quotationOptions>

And the speed is almost 110 Pakets per second. Thats the reason I want keep the wordround as less as i can, otherwise can Python not that fast to process all the pakets.

Thanks.

In general, attempting to deal with XML using regexes is a futile exercise. While regexes may be able to handle simple tasks, the requirements for XML parsing tend to outgrow regex capabilities, resulting in bugs as well as maintenance & readability problems. It's usually better to start with a proper XML parser from the beginning.

That said, there is a simple way to deal with this particular case. findall returns tuples when there are multiple groups, so there should be at most one group in the regex. It could be accomplished with no groups by using lookarounds , but simpler is to move the alternation to the tag name, rather than the entirety of the match. For example:

<Response(?:Code|RunTime)>([^<]*)<

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM