使用正则表达式从HTML表中提取特定值

Question

I have a html file that contains this table row: 我有一个包含此表行的html文件：

<tr> 
<td class="color21 right" style="font-size:12px; line-height:1.2;">&nbsp;Location</td>
<td class="color21" style="font-size:12px;">10</td>
<td class="color21" style="font-size:12px;"><img src="../../icons/9.gif" alt="Type" />     </td>
<td class="color21" style="font-size:12px;">3</td>
<td class="color21" style="font-size:12px;">7</td>
<td class="color21" style="font-size:12px;"><img src="../../icons/11.gif" alt="Type" />    </td>
<td class="color21" style="font-size:12px;">3</td>
<td class="color21" style="font-size:12px;">10</td>
<td class="color21" style="font-size:12px;"><img src="../../icons/9.gif" alt="Type" />    </td>
</tr>

I'm retrieving file contents using file_get_contents. 我正在使用file_get_contents检索文件内容。

How can I extract all TD values using preg_match, preg_match_all? 如何使用preg_match，preg_match_all提取所有TD值？

Answer 1

Think over if you really wanna a regex to parse html 考虑一下您是否真的想使用正则表达式来解析html

But you can use this: 但是您可以使用以下命令：

<td.+?>(.+?)</td>

The first group will contain the values of <td> 第一组将包含<td>的值

Answer 2

Use the DomParser to Parse the html content regex are not reliable on this cases. 在这种情况下，请使用DomParser解析html内容正则表达式。

    $str=file_get_contents('read.txt');
    $dom = new domDocument;
    $dom->loadHTML($str);
    $tr = $dom->getElementsByTagName('td');
    foreach($tr as $td)
  {
    if(!empty($td->nodeValue)){
        echo $td->nodeValue."\n";
    }else{
        $images=$td->getElementsByTagName('img');
        foreach($images as $image){
            echo $image->getAttribute('src')." ";
            echo $image->getAttribute('alt');
        }
    }

使用正则表达式从HTML表中提取特定值

问题描述

2 个解决方案

解决方案1
1 2014-04-15 16:37:22

解决方案2
1 已采纳 2014-04-15 16:48:19

使用正则表达式从HTML表中提取特定值

问题描述

2 个解决方案

解决方案1 1 2014-04-15 16:37:22

解决方案2 1 已采纳 2014-04-15 16:48:19

解决方案1
1 2014-04-15 16:37:22

解决方案2
1 已采纳 2014-04-15 16:48:19