How to match exact “multiple” strings in Python

Question

I want to make a live packet monitoring for SOAP/XML. Here is the code.

from scapy.all import *   

def pack_callback(packet):

    if packet["TCP"].payload:  
        payload = str(packet["TCP"].payload)  

        Code = '<ResponseCode>(.*?)<|<ResponseRunTime>(.*?)<'

        pat = re.compile(Code) 
        n = pat.findall(payload)
        if n:
            #print n.groups()
            print n

sniff(filter='tcp and port 186 or port 86',prn=pack_callback,iface='vmxnet3 Ethernet Adapter')`

But if I use re.search , I got ('0', None) , when I used re.findall , I got [('0', ''), ('', '1763')]

My question is, how can I get ('0', '1763') ? I mean first match <ResponseCode>(.*?)< then match <ResponseRunTime>(.*?)< not search the XML everytime from the beginning.

The SOAP response is like following:

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
   <soap:Body>
      <ns3:RetrieveQuotationResponse xmlns>
         <ResponseVersion>5</ResponseVersion>
         <ResponseCode>0</ResponseCode>
         <ResponseMessage>Correct Petition</ResponseMessage>
         <ResponseRunTime>1887</ResponseRunTime>
         <ResponseData>
            <billingDays>2</billingDays>
            <destinationCurrencyValue>0.0</destinationCurrencyValue>
            <dropOffDate>2018-02-23</dropOffDate>
            <dropOffOfficeId>D2</dropOffOfficeId>
            <dropOffOfficeNameParis</dropOffOfficeName>
            <dropOffTime>09:00</dropOffTime>
            <pickUpDate>2018-02-21</pickUpDate>
            <pickUpOfficeId>D2</pickUpOfficeId>
            <pickUpOfficeName>Paris</pickUpOfficeName>
            <pickUpTime>09:00</pickUpTime>
            <quotationNote>There Are 29 Car Types Availables.</quotationNote>
            <quotationOptions>

And the speed is almost 110 Pakets per second. Thats the reason I want keep the wordround as less as i can, otherwise can Python not that fast to process all the pakets.

Thanks.

Answer 1

In general, attempting to deal with XML using regexes is a futile exercise. While regexes may be able to handle simple tasks, the requirements for XML parsing tend to outgrow regex capabilities, resulting in bugs as well as maintenance & readability problems. It's usually better to start with a proper XML parser from the beginning.

That said, there is a simple way to deal with this particular case. findall returns tuples when there are multiple groups, so there should be at most one group in the regex. It could be accomplished with no groups by using lookarounds , but simpler is to move the alternation to the tag name, rather than the entirety of the match. For example:

<Response(?:Code|RunTime)>([^<]*)<

How to match exact “multiple” strings in Python

Question

1 answers

solution1
0 2022-10-08 08:27:26

How to match exact “multiple” strings in Python

Question

1 answers

solution1 0 2022-10-08 08:27:26

solution1
0 2022-10-08 08:27:26