简体   繁体   English

将选定的HTML表转换为JSON

[英]Convert a selected HTML Table to JSON

Is it possible to convert just a selection of a HTML with multiple tables to JSON ? 是否可以仅将具有多个表的HTML选择转换为JSON?

I have this Table: 我有这张桌子:

<div class="mon_title">2.11.2015 Montag</div>
    <table class="info" >
    <tr class="info"><th class="info" align="center" colspan="2">Nachrichten zum Tag</th></tr>
    <tr class='info'><td class='info' colspan="2"><b><u></u>   </b>
    ...
    </table>
    <p>
    <table class="mon_list" >

    ...
    </table>

And this PHP code to covert it into JSON: 并将此PHP代码隐藏为JSON:

function save_table_to_json ( $in_file, $out_file ) {
    $html = file_get_contents( $in_file );
    file_put_contents( $out_file, convert_table_to_json( $html ) );
}

function convert_table_to_json ( $html ) {
    $document = new DOMDocument();
    $document->loadHTML( $html );

    $obj = [];
    $jsonObj = [];
    $th = $document->getElementsByTagName('th');
    $td = $document->getElementsByTagName('td');
    $thNum = $th->length;
    $arrLength = $td->length;
    $rowIx = 0;

    for ( $i = 0 ; $i < $arrLength ; $i++){
        $head = $th->item( $i%$thNum )->textContent;
        $content = $td->item( $i )->textContent;
        $obj[ $head ] = $content;
        if( ($i+1) % $thNum === 0){ 
            $jsonObj[++$rowIx] = $obj;
            $obj = [];
        }
    }
    save_table_to_json( 'heute_S.htm', 'heute_S.json' );

What it does is takes the table class=info and the table class=mon_list and converts it to json. 它所做的是获取表class=info和表class=mon_list并将其转换为json。

Is there any way that it can just take the table class=mon_list ? 有什么办法可以只使用class=mon_list表吗?

You can use XPath to search for the class, and then create a new DOM document that only contains the results of the XPath query. 您可以使用XPath搜索该类,然后创建一个仅包含XPath查询结果的新DOM文档。 This is untested, but should get you on the right track. 这未经测试,但可以使您步入正轨。

It's also worth mentioning that you can use foreach to iterate over the node list. 还值得一提的是,您可以使用foreach遍历节点列表。

$document = new DOMDocument();
$document->loadHTML( $html );

$xpath = new DomXPath($document);
$tables = $xpath->query("//*[contains(@class, 'mon_list')]");
$tableDom = new DomDocument();
$tableDom->appendChild($tableDom->importNode($tables->item(0), true));

$obj = [];
$jsonObj = [];
$th = $tableDom->getElementsByTagName('th');
$td = $tableDom->getElementsByTagName('td');
$thNum = $th->length;
$arrLength = $td->length;
$rowIx = 0;

for ( $i = 0 ; $i < $arrLength ; $i++){
    $head = $th->item( $i%$thNum )->textContent;
    $content = $td->item( $i )->textContent;
    $obj[ $head ] = $content;
    if( ($i+1) % $thNum === 0){ 
        $jsonObj[++$rowIx] = $obj;
        $obj = [];
    }
}

Another unrelated answer is to use getAttribute() to check the class name. 另一个不相关的答案是使用getAttribute()检查类名。 Someone on a different answer has written a function for doing this: 答案不同的人已经编写了执行此操作的功能:

function getElementsByClass(&$parentNode, $tagName, $className) {
    $nodes=array();

    $childNodeList = $parentNode->getElementsByTagName($tagName);
    for ($i = 0; $i < $childNodeList->length; $i++) {
        $temp = $childNodeList->item($i);
        if (stripos($temp->getAttribute('class'), $className) !== false) {
            $nodes[]=$temp;
        }
    }

    return $nodes;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM