繁体   English   中英

使用php查找html源的类名

[英]find class name of html source using php

我是 PHP 新手。 我想编写代码来查找下面 html 代码中指定的id ,即1123 任何人都可以给我一些想法吗?

<span class="miniprofile-container /companies/1123?miniprofile="
      data-tracking="NUS_CMPY_FOL-nhre"
      data-li-getjs="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=dyt8o4nwtaujeutlgncuqe0dn&amp;fc=2">
    <strong>
        <a href="http://www.linkedin.com/nus-trk?trkact=viewCompanyProfile&pk=biz-overview-public&pp=1&poster=&uid=5674666402166894592&ut=NUS_UNIU_FOLLOW_CMPY&r=&f=0&url=http%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fcompany%2F1123%3Ftrk%3DNUS_CMPY_FOL-nhre&urlhash=7qbc">
        Bank of America
        </a>
    </strong>
</span> has a new Project Manager

注意:我不需要 span 类中的内容。 我需要跨类名称中的id

我尝试了以下方法:

$dom = new DOMDocument('1.0', 'UTF-8');
@$dom->loadHTML($html);
$xmlElements = simplexml_import_dom($dom);
$id = $xmlElements->xpath("//span [@class='miniprofile-container /companies/$data_id?miniprofile=']");

......但我不知道如何进一步。

取决于你的需要,你可以做

$matches = array();
preg_match('|<span class="miniprofile-container /companies/(\d+)\?miniprofile|', $html, $matches);
print_r($matches);

这是一个非常简单的正则表达式,但可以作为第一个建议。 如果您想通过 DomDocument 或 simplexml,则不能像在示例中那样将两者混合使用。 您的首选方式是什么,然后我们可以缩小范围。

//编辑:几乎@fireeyedboy说的,但这就是我刚刚摆弄的:

<?php
$html = <<<EOD
<html><head></head>
<body>
<span class="miniprofile-container /companies/1123?miniprofile="
      data-tracking="NUS_CMPY_FOL-nhre"
      data-li-getjs="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=dyt8o4nwtaujeutlgncuqe0dn&amp;fc=2">
    <strong>
        <a href="#">
        Bank of America
        </a>
    </strong>
</span> has a new Project Manager

</body>
</html>
EOD;

$domDocument = new DOMDocument('1.0', 'UTF-8');
$domDocument->recover = TRUE;
$domDocument->loadHTML($html);

$xPath = new DOMXPath($domDocument);
$relevantElements = $xPath->query('//span[contains(@class, "miniprofile-container")]');
$foundId = NULL;
foreach($relevantElements as $match) {
    $pregMatches = array();
    if (preg_match('|/companies/(\d+)\?miniprofile|', $match->getAttribute('class'), $pregMatches)) {
        if (isset($pregMatches[1])) {
            $foundId = $pregMatches[1];
            break;
        }
    };
}

echo $foundId;

?>

这应该做你所追求的:

$dom = new DOMDocument('1.0', 'UTF-8');
@$dom->loadHTML( $html );
$xpath = new DOMXPath( $dom );

/*
 * the following xpath query will find all class attributes of span elements
 * whose class attribute contain the strings " miniprofile-container " and " /companies/"
 */
$nodes = $xpath->query( "//span[contains(concat(' ', @class, ' '), ' miniprofile-container ') and contains(concat(' ', @class, ' '), ' /companies/')]/@class" );
foreach( $nodes as $node )
{
    // extract the number found between "/companies/" and "?miniprofile" in the node's nodeValue
    preg_match( '#/companies/(\d+)\?miniprofile#', $node->nodeValue, $matches );
    var_dump( $matches[ 1 ] );
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM