I would like to match the last column in the first row from the following HTML: (this is just an example)
<tr> <td> ABC </td> <td> DEF </td> <td> ABC </td> </tr>
<tr> <td> GHI </td> <td> JKL </td> <td> GHI </td> </tr>
So what I want to match is: <td> ABC </td> </tr>
I tried toying around with regex101.com but I just can't find a proper way to match the last <td>
from the first row only.
What I got so far is the following regex: (<td>).*?(<\\/tr>)
which matches
<td> ABC </td> <td> DEF >/td> <td> ABC </td> </tr>
though.
Is there any way to match only the shortest string between <td>
and </tr>
? (I found similar questions but can't figure out a solution to this one.)
Prepend your pattern with "start of string" ( ^
) + "anything but </tr>
" ( (?:.(?!<\\/tr>))*
) to ensure no </tr>
appears before your pattern (and therefore your match is the first one before </tr>
). The original pattern should be captured with a group then:
^(?:.(?!<\/tr>))*((?:<td>).*?(?:<\/tr>))
I would use this to match the required text into a group:
.*(<td>.+<\/td>?.+<\/tr>)
Regex 1 is 11 characters,
<td.{14}tr>
Regex 2 is 30 characters but it'll cover any amount of content,
<td>\\s*\\w*?\\s*<\\/td>\\s*<\\/tr>
but the real problem is that you wanted only one match, while this regex like most others will match more than once when the string is a multi-line HTML fragment. The solution is simple:
No global flag - Once a match is found it stops
/* Regex 1 || Literal: <td || Any 14 char or space (no line terminators) || Literal: tr> || NO GLOBAL FLAG - Once a match is found it stops */ const rgx1 = /<td.{14}tr>/; /* Regex 2 || Literal: <td> || Zero or more spaces || Zero or more word characters lazily collect until || Zero or more spaces || Literal: <\\td> || Zero or more spaces || Literal: </tr> */ const rgx2 = /<td>\\s*\\w*?\\s*<\\/td>\\s*<\\/tr>/ const str = `<tr> <td> ABC </td> <td> DEF </td> <td> ABC </td> </tr> <tr> <td> GHI </td> <td> JKL </td> <td> GHI </td> </tr>`; let res1 = str.match(rgx1); let res2 = str.match(rgx2); console.log('Result 1: ' + res1); console.log('Result 2: ' + res2);
BTW, there's a typo in the string: DEF >/td>
and JKL >/td>
console.log( `<tr> <td> ABC </td> <td> DEF </td> <td> XCC </td> </tr> <tr> <td> GHI </td> <td> JKL </td> <td> GHI </td> </tr>` .match(/\\w+(?=[</> td]+r>)/) )
Be as precise as possible when writing your regexps.
(<td>)[^\<\>]*(<\/td>)\s*(<\/tr>)
This assumes that the contents of the td tag does not contain html markup.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.