How to avoid html blocks with regex

Question

I have to find all the strings surrounded by "[" and "]" using regex, but avoiding the ones inside the <table></table> block, for example:

<html>
<body>
<p><table>
   <tbody>
      <tr>
         <td style="border-style: solid; border-width:1px;">
            <span style="font-family: Courier;">[data1]</span>
         </td>
         <td style="border-style: solid; border-width:1px;">
            <span style="font-family: Courier;">[data10]</span>
         </td>
      </tr>
   </tbody>
</table>
</p>
<p>[data3]&nbsp;&nbsp;[data4]&nbsp;&nbsp;[data5]</p>
</body>
</html>

in this case only [data3], [data4] and [data5] should be found. So far I have this: @"(((?<?<span>)(\[[a-zA-Z_0-9]+)](??<\/span>))|((?<.<span>)(\[[a-zA-Z_0-9]+)])|((\[[a-zA-Z_0-9]+)](?!<\/span>)))(?!.*\1)" That finds all the [] blocks that are not surrounded by tags and I tried adding a negative lookahead and lookbehind of but it doesn't work, it stills gets the ones inside the table block.

Hope you guys can help me with this.

Answer 1

Below regex will return your all [data] which enclose in <p> </p> tag.

/<p.*?>\[(.*?)\]<*.p>/g

so above regex will return this <p>[data3]  [data4]  [data5]</p> from your above HTML code.

When you get that string from above regex then use below regex to get only all [data] string.

/\[(.*?)\]/g

so above regex will return " [data3][data4][data5] " from above string.

How to avoid html blocks with regex

Question

1 answers

solution1
-1 2020-07-14 09:19:17

How to avoid html blocks with regex

Question

1 answers

solution1 -1 2020-07-14 09:19:17

solution1
-1 2020-07-14 09:19:17