簡體   English   中英

解析tei domxpath獲取評估循環內的文本子標記

[英]parse tei domxpath get text child tag inside evaluate loop

從包含tei文件的字符串中,我生成一個索引以導航到它們的塊,檢索所有div標簽,我還想獲取當前div中標簽的內容(標簽<head> )(如果存在) 。

tei文件示例:

    <div type="lib" n="1"><head>LIBER I</head>...
<div type="pr">...</div>
<div type="cap" n="1"><head>CAP EX</head><p><milestone unit="par" n="1" />...<milestone unit="par" n="2" />...</div>
<div type="cap" n="2"><head>CAP EX</head><milestone unit="par" n="1" />...<milestone unit="par" n="2" />...</div>
</div>

我嘗試了這個,但是不起作用:

 //source file:
  $fulltext = '<div type="lib" n="1"><head>LIBER I</head>...<div type="pr">...</div><div type="cap" n="1"><head>CAP EX</head><p><milestone unit="par" n="1" />...<milestone unit="par" n="2" />...</div><div type="cap" n="2"><head>CAP EX</head><milestone unit="par" n="1" />...<milestone unit="par" n="2" />...</div></div>';
    $dom = new DOMDocument();
    @$dom->loadHTML($fulltext);
    $domx = new DOMXPath($dom);
    $entries = $domx->evaluate("//div");
    echo '<ul>';
    foreach ($entries as $entry){
    $title = '';
    type = $entry->getAttribute( 'type' );
    $n = $entry->getAttribute( 'n' );
    $head = $domx->evaluate("string(./head[1])",$entry);
    if( $head != '' ) $title = $head; else $title = $n;
    echo '<li><a href="#'.$type.'-'.$n.'">'.$title.'</li>';
    }
    echo '</ul>';

該行不起作用:

$head = $domx->evaluate("string(./head[1])",$entry);

返回錯誤:

 DOMDocument::loadHTML(): htmlParseStartTag: misplaced <head> tag in Entity, line: 3

該行的目的是在循環內獲取子標簽頭的文本(在此示例中為“ LIBER I”)

在負載上使用@符號可以隱藏各種問題。 因此,如果將其取出,則文檔會出錯。

但是,如果您將行更改為

$dom->loadXML($fulltext);

輸出將為您提供服務。

使用XMLReader解決:

    $level = 0;
                $indici_bc = array();
                $indici_head = array();
                $passed_milestone = false;
                $xml = new XMLReader(); 
                $xml->open($pathTei);
                //$xml->xml($testo);
                while ($xml->read()){
                    if($xml->nodeType == XMLReader::END_ELEMENT && $xml->name == 'div'){
                        $level--;
                        $last_blocco = $xml->name;
                        if($passed_milestone){ $level--; $passed_milestone = false; }
                    }
                    if($xml->nodeType == XMLReader::ELEMENT && ($xml->name == 'div' || $xml->name == 'milestone' )){
                        $blocco = $xml->name;
                        $type = $xml->getAttribute('type');
                        $n = $xml->getAttribute('n');
                        $unit =  isset($xml->getAttribute('unit')) ? $xml->getAttribute('unit') : '';

//here I get the child node
$node = new SimpleXMLElement($xml->readOuterXML());
                        $head = $node->head ? (string)$node->head : '';

                        $indici_head[] = $head;
                        if($last_blocco != 'milestone') $level++;
                        if($blocco == 'div') $bc[$level] = $n; else $bc[($level+1)] = $n;
                        $bc_str = '';
                        for($j=1;$j<$level;$j++){
                            if( $bc_str != '' ) $bc_str.='.';
                            $bc_str.=$bc[$j];
                        }
                        if( $bc_str != '' ) $bc_str.='.';
                        $bc_str.=$n;

                        $last_blocco = $xml->name;
                        if( $blocco == 'milestone' ) $passed_milestone = true;

                        $indici_bc[]=$bc_str;
                    }
                }
                $xml->close();

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM