简体   繁体   中英

Python REGEX matching a multiline with carriage return

I have the following data:

POST / HTTP/1.1
User-Agent: curl/7.27.0
Host: 127.0.0.1
Accept: */*
Content-Length: 55
Content-Type: application/x-www-form-urlencoded

id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk

or

POST / HTTP/1.1\r\n
User-Agent: curl/7.27.0\r\n
Host: 127.0.0.1\r\n
Accept: */*\r\n
Content-Length: 55\r\n
Content-Type: application/x-www-form-urlencoded\r\n
\r\n
id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk\r\n

or

POST / HTTP/1.1^M
User-Agent: curl/7.27.0^M
Host: 127.0.0.1^M
Accept: */*^M
Content-Length: 55^M
Content-Type: application/x-www-form-urlencoded^M
^M
id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk^M

how can I match the id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk string only? I mean anything printable between two end of lines ( \\r\\n or ^M ) and next end of line ( \\r\\n or ^M ) I tried something like:

re.findall(r'^>([^\r\n]+)[\r\n]([a-zA-Z0-9=%&\r\n]+)', buf, re.MULTILINE|re.DOTALL)

but no match. What am I doing wrong?

I'm not sure why you have > at the beginning of your regex. This is what is preventing you from getting any matches at all. If you now remove it, there are a lot of matches which you do not seem to need.

I would suggest:

(?<![\r\n])(?:\r\n|\r|\n){2}[^\r\n]+

Which ensures that you have only 2 consecutive newlines (either two \\r\\n , \\r , or \\n ) before the line you're trying to match. The negative lookbehind (?<![\\r\\n]) is what enforces it (it fails the match if there's a newline/carriage return character before the two consecutive newlines).

The above regex doesn't really need the multiline and dotall flags, so you can drop them in this instance if you want to.

regex101 demo


EDIT: Since the \\r , \\n and ^M are not metacharacters, I would suggest this:

(?<![\r\n])(?:(?:\\r\\n|\^M)?(?:\r\n|\r|\n)){2}((?:(?!\\r\\?n?|\\n|\^M)[^\r\n\x00])+)(?:\\r\\n|\^M)?

regex101 demo

Try this :

(?:(?:\^M)|[\n\r])+(id=.*)(?=(?:\^M)|[\n\r])

Check online DEMO

Explanation

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM