I'm having trouble capturing this data:
<tr>
<td><span class="bodytext"><b>Contact:</b><b></b></span><span style='font-size:10.0pt;font-family:Verdana;
mso-bidi-font-family:Arial'><b> </b>
<span class="bodytext">John Doe</span>
</span></td>
</tr>
<tr>
<td><span class="bodytext">PO Box 2112</span></td>
</tr>
<tr>
<td><span class="bodytext"></span></td>
</tr>
<!--*********************************************************
-->
<tr>
<td><span class="bodytext"></span></td>
</tr>
<tr>
<td><span class="bodytext">JOHAN</span> NSW 9700</td>
</tr>
<tr>
<td><strong>Phone:</strong>
02 9999 9999
</td>
</tr>
Basically, I want to grab everything after "Contact:" and before "Phone:" minus the HTML; however these two designations may not always exist so I need to really grab everything between the two colons (:) that isn't located inside a HTML tag. The number of <span class="bodytext">***data***</span>
may actually vary so I need some sort of loop for matching these.
I prefer to use regular expressions as I could probably do this using loops and string matches.
Also, I'd like to know the syntax for non-matching groups in PHP regex.
Any help would be greatly appreciated!
If I understand you correctly, you're only interested in the text between the HTML tags. To ignore the HTML tags, simply strip them first:
$text = preg_replace('/<[^<>]+>/', '', $html);
To grab everything between "Contact:" and "Phone:", use:
if (preg_match('/Contact:(.*?)Phone:/s', $text, $regs)) {
$result = $regs[1];
} else {
$result = "";
}
To grab everything between two colons, use:
if (preg_match('/:([^:]*):/', $text, $regs)) {
$result = $regs[1];
} else {
$result = "";
}
听起来像是抓屏 ,或者找到所需信息后也可以使用strip_tags() 。
The seemingly arbitrary stack overflow response to these sort of questions seems to be "omg don't use regexes! Use Beautiful Soup instead!!". Personally I prefer not having to use external libraries for small tasks like this, and regexes are a good alternative.
A simple way to strip out all the HTML tags, which is one way to tackle this, is to use this regex:
$text = preg_replace("/<.*?>/", "", $text);
then you can use whatever method you like to grab the appropriate text content.
Non matching groups are like this: (?:this won't match)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.