簡體   English   中英

PHP DOMDocument 抓取嵌套的 class

[英]PHP DOMDocument Scraping nested class

I've HTML document like this I want to scrape all categories and subcategory, I've try to scrape with DOMXpath and load the document html then filter class "css-1qaqbbz" (but only get the categories), which my expected array like這個

[

'Arsitektur & Desain' => [

“布庫班古南”,

《布庫規范與標准》

//ETC...

],

]

     <div>
        <a class="css-1qaqbbz">Arsitektur &amp; Desain
        </a>
     </div>
     <div class="css-1wode1h">
        <a data-testid="categoryNavigation#1" class="css-1nykm5o">Buku Bangunan</a>
        <a data-testid="categoryNavigation#2" class="css-1nykm5o">Buku Codes &amp; Standars</a>
        <a data-testid="categoryNavigation#3" class="css-1nykm5o">Buku Dekorasi &amp; Ornamen</a>
        <a data-testid="categoryNavigation#4" class="css-1nykm5o">Buku Desain Dapur</a>
        <a data-testid="categoryNavigation#5" class="css-1nykm5o">Buku Desain Kamar</a>
        <a data-testid="categoryNavigation#6" class="css-1nykm5o">Buku Desain Ruang Keluarga</a>
        <a data-testid="categoryNavigation#7" class="css-1nykm5o">Buku Desain Ruang Tamu</a>
        <a data-testid="categoryNavigation#8" class="css-1nykm5o">Buku Desain Rumah</a>
        <a data-testid="categoryNavigation#9" class="css-1nykm5o">Buku Interior &amp; Eksterior</a>
        <a data-testid="categoryNavigation#10" class="css-1nykm5o">Buku Metode &amp; Material Bangunan</a>
        <a data-testid="categoryNavigation#11" class="css-1nykm5o">Buku Taman</a>
     </div>
    
    
    <div class="css-1owj1eu" data-testid="catNavigation#2">
        <div>
            <a class="css-1qaqbbz">Buku Hukum</a>
        </div>
        <div class="css-1wode1h">
            <a data-testid="categoryNavigation#1"  class="css-1nykm5o">Buku Gender &amp; Hukum</a>
            <a data-testid="categoryNavigation#2"  class="css-1nykm5o">Buku Hukum Dagang</a>
            <a data-testid="categoryNavigation#3"  class="css-1nykm5o">Buku Hukum Internasional</a>
            <a data-testid="categoryNavigation#4"  class="css-1nykm5o">Buku Hukum Perdata</a>
            <a data-testid="categoryNavigation#5"  class="css-1nykm5o">Buku Hukum Pidana</a>
            <a data-testid="categoryNavigation#6" class="css-1nykm5o">Buku Kemanusiaan</a>
            <a data-testid="categoryNavigation#7"  class="css-1nykm5o">Buku Politik &amp; Hukum</a>
            <a data-testid="categoryNavigation#8" class="css-1nykm5o">Kumpulan Peraturan Perundang-Undangan</a>
            <a data-testid="categoryNavigation#9" class="css-1nykm5o">UUD 1945</a></div>
    </div>

這是我要抓取的源代碼

$dom = new \DOMDocument;
$dom->loadHTML($f);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='css-1qaqbbz']");


if ($results->length > 0) {
echo "<pre>";
    $arrCats = []
    foreach ($results as $key => $value) {
    $arrCats[] = $value->nodeValue;

    }
// die;
}

只需更改您的 XPath 查詢:

$results = $xpath->query("//a[starts-with(@class,'css')]");

輸出:

大批

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM