简体   繁体   中英

Why is regex failing to match all desired groups

I have the following string

{class:"table table-striped"},c.a.createElement("thead",{class:"thread-dark"},c.a.createElement("tr",null,c.a.createElement("th",{scope:"col"},"Round 1"),c.a.createElement("th",{scope:"col"}),c.a.createElement("th",{scope:"col"}),c.a.createElement("th",{scope:"col"}),c.a.createElement("th",{scope:"col"}))),c.a.createElement("tbody",null,c.a.createElement("tr",null,c.a.createElement("td",null,"Parc des Princes",c.a.createElement("br",null),"Paris"),c.a.createElement("td",{align:"right"},"France ",c.a.createElement("img",{src:"img/RoundFlags/France.png",width:"50",hspace:"20"})),c.a.createElement("td",{className:"align-middle",align:"center"},c.a.createElement("a",{href:"/fra-kor"},"4 - 0")),c.a.createElement("td",{align:"left"},c.a.createElement("img",{src:"img/RoundFlags/Korea.png",width:"50",hspace:"20"})," Korea"),c.a.createElement("td",null,"Group A")),c.a.createElement("tr",null,c.a.createElement("td",null,"Roazhon Park",c.a.createElement("br",null),"Rennes"),c.a.createElement("td",{align:"right"},"Germany ",c.a.createElement("img",{src:"img/RoundFlags/Germany.png",width:"50",hspace:"20"})),c.a.createElement("td",{className:"align-middle",align:"center"},c.a.createElement("a",{href:"/deu-chn"},"1 - 0")),c.a.createElement("td",{align:"left"},c.a.createElement("img",{src:"img/RoundFlags/China.png",width:"50",hspace:"20"})," China"),c.a.createElement("td",null,"Group B")),c.a.createElement("tr",null,c.a.createElement("td",null,"Stade Oceane",c.a.createElement("br",null),"Le Havre"),c.a.createElement("td",{align:"right"},"Spain ",c.a.createElement("img",{src:"img/RoundFlags/Spain.png",width:"50",hspace:"20"})),c.a.createElement("td",{className:"align-middle",align:"center"}

I am trying to retrieve all characters between 2 identical substrings ie all characters between cacreateElement("tr" and cacreateElement("tr" as a list.

My attempt is to use the following regex pattern:

c\.a\.createElement\("tr"(.*?)c\.a\.createElement\("tr"

Instead of getting every in between sequence some are being skipped as shown in image:

In the above you can see the second match 1 group 0 (light blue) is not followed by a group 1 (green) despite there being a following cacreateElement("tr" (match 2 dark blue).

If it helps the regex is available here: https://regex101.com/r/xzbHBU/1/

I tried various lookarounds eg

(?<=c\.a\.createElement\("tr")(.*?)(?!c\.a\.createElement\("tr")

and adding re.DOTALL flag; all of which failed miserably to match as were too restrive.

Can anyone help me write the appropriate regex to retrieve all groups as I described in my expectations?

Python:

import re

s = '{class:"table table-striped"},c.a.createElement("thead",{class:"thread-dark"},c.a.createElement("tr",null,c.a.createElement("th",{scope:"col"},"Round 1"),c.a.createElement("th",{scope:"col"}),c.a.createElement("th",{scope:"col"}),c.a.createElement("th",{scope:"col"}),c.a.createElement("th",{scope:"col"}))),c.a.createElement("tbody",null,c.a.createElement("tr",null,c.a.createElement("td",null,"Parc des Princes",c.a.createElement("br",null),"Paris"),c.a.createElement("td",{align:"right"},"France ",c.a.createElement("img",{src:"img/RoundFlags/France.png",width:"50",hspace:"20"})),c.a.createElement("td",{className:"align-middle",align:"center"},c.a.createElement("a",{href:"/fra-kor"},"4 - 0")),c.a.createElement("td",{align:"left"},c.a.createElement("img",{src:"img/RoundFlags/Korea.png",width:"50",hspace:"20"})," Korea"),c.a.createElement("td",null,"Group A")),c.a.createElement("tr",null,c.a.createElement("td",null,"Roazhon Park",c.a.createElement("br",null),"Rennes"),c.a.createElement("td",{align:"right"},"Germany ",c.a.createElement("img",{src:"img/RoundFlags/Germany.png",width:"50",hspace:"20"})),c.a.createElement("td",{className:"align-middle",align:"center"},c.a.createElement("a",{href:"/deu-chn"},"1 - 0")),c.a.createElement("td",{align:"left"},c.a.createElement("img",{src:"img/RoundFlags/China.png",width:"50",hspace:"20"})," China"),c.a.createElement("td",null,"Group B")),c.a.createElement("tr",null,c.a.createElement("td",null,"Stade Oceane",c.a.createElement("br",null),"Le Havre"),c.a.createElement("td",{align:"right"},"Spain ",c.a.createElement("img",{src:"img/RoundFlags/Spain.png",width:"50",hspace:"20"})),c.a.createElement("td",{className:"align-middle",align:"center"}'
p = re.compile(r'c\.a\.createElement\("tr"(.*?),c.a.createElement\("tr"')
matches = p.findall(s)
print(len(matches))

Using lookahead & lookbehind

Ex:

import re

s = '{class:"table table-striped"},c.a.createElement("thead",{class:"thread-dark"},c.a.createElement("tr",null,c.a.createElement("th",{scope:"col"},"Round 1"),c.a.createElement("th",{scope:"col"}),c.a.createElement("th",{scope:"col"}),c.a.createElement("th",{scope:"col"}),c.a.createElement("th",{scope:"col"}))),c.a.createElement("tbody",null,c.a.createElement("tr",null,c.a.createElement("td",null,"Parc des Princes",c.a.createElement("br",null),"Paris"),c.a.createElement("td",{align:"right"},"France ",c.a.createElement("img",{src:"img/RoundFlags/France.png",width:"50",hspace:"20"})),c.a.createElement("td",{className:"align-middle",align:"center"},c.a.createElement("a",{href:"/fra-kor"},"4 - 0")),c.a.createElement("td",{align:"left"},c.a.createElement("img",{src:"img/RoundFlags/Korea.png",width:"50",hspace:"20"})," Korea"),c.a.createElement("td",null,"Group A")),c.a.createElement("tr",null,c.a.createElement("td",null,"Roazhon Park",c.a.createElement("br",null),"Rennes"),c.a.createElement("td",{align:"right"},"Germany ",c.a.createElement("img",{src:"img/RoundFlags/Germany.png",width:"50",hspace:"20"})),c.a.createElement("td",{className:"align-middle",align:"center"},c.a.createElement("a",{href:"/deu-chn"},"1 - 0")),c.a.createElement("td",{align:"left"},c.a.createElement("img",{src:"img/RoundFlags/China.png",width:"50",hspace:"20"})," China"),c.a.createElement("td",null,"Group B")),c.a.createElement("tr",null,c.a.createElement("td",null,"Stade Oceane",c.a.createElement("br",null),"Le Havre"),c.a.createElement("td",{align:"right"},"Spain ",c.a.createElement("img",{src:"img/RoundFlags/Spain.png",width:"50",hspace:"20"})),c.a.createElement("td",{className:"align-middle",align:"center"}'
p = re.compile(r'(?<=c\.a\.createElement\("tr")(.*?)(?=,c.a.createElement\("tr")')
matches = p.findall(s)
print(matches)

Output:

3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM