簡體   English   中英

從 HTML 提取 href 元素無法使用 node.js

[英]Extract href from HTML Element not working using node.js

我正在嘗試從該網站上抓取數據: https://www.gelbeseiten.de/Suche/Fotografen/Berlin ,由於某種原因,我無法獲得特定元素,而其他元素則可以正常工作。 我使用 node.js 和 puppeteer。 我需要元素的 URL 和 class .contains-icon-homepage並且我收到針對它的錯誤消息。 如果我以上面的.contains-icon-aktualisieren元素為目標,那么它可以工作。

錯誤信息:

UnhandledPromiseRejectionWarning:錯誤:評估失敗:TypeError:無法讀取 null 的屬性“getAttribute”

HTML:

<article class="mod mod-Treffer" data-teilnehmerid="1057046049">
            <a href="https://www.gelbeseiten.de/gsbiz/64195b36-063f-4401-8536-0d1c71a76326" data-realid="64195b36-063f-4401-8536-0d1c71a76326" data-tnid="1057046049" target="_self"
            >
                
        <div class="mod-hervorhebung">
        
    </div>
        
            <picture class="trefferlisten_logo">
        <source media="(min-width: 768px)" srcset="https://ies.v4all.de/0122/GS/0001/7/5017/30015017_310x190.png" />
        
        <img alt="" data-lazy-src="https://ies.v4all.de/0122/GS/0001/7/5017/30015017_310x190.png"/>
    </picture>
        
        <h2 data-wipe-name="Titel">Rudolph Silke</h2>
        <p class="d-inline-block mod-Treffer--besteBranche">
            Fotografen und Fotostudios
        </p>
        
        <address class="mod mod-AdresseKompakt">
        <p data-wipe-name="Adresse">
            Samariterstr. 33, 
            <span class="nobr">
                10247
                Berlin
            </span>
            (Friedrichshain)
            <span class="mod-AdresseKompakt__entfernung" title="Entfernung ab Suchmittelpunkt">6 km</span>
        </p>

        <p class="mod-AdresseKompakt__phoneNumber" data-hochgestellt-position="end" data-wipe-name="Kontaktdaten">(030) 4 26 66 86</p>

    </address>
                
        <div class="oeffnungszeit_kompakt__zustandsinfo--geschlossen">
            <span>Geschlossen</span>, 
            <span class="nobr">öffnet Samstag um 10:00</span>
        </div>
                
            </a>
                    
                <div class="aktionsleiste_kompakt">
        
        <div class="mod-gsSlider mod-gsSlider--noneOnWhite">
            <span
                    class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="left" data-show="false" data-wipe="{&quot;listener&quot;:&quot;click&quot;,&quot;name&quot;:&quot;Trefferliste: Aktionleiste-button-links&quot;}"></span>
            <span
                    class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="right" data-show="false" data-wipe="{&quot;listener&quot;:&quot;click&quot;,&quot;name&quot;:&quot;Trefferliste: Aktionleiste-button-rechts&quot;}"></span>
            <div class="mod-gsSlider__slider">              
        
            <a
                class="contains-icon-aktualisieren gs-btn"
                rel="noopener"
                href="https://www.gelbeseiten.de/gsbiz/64195b36-063f-4401-8536-0d1c71a76326#aktuelleinformationen"
                data-wipe="{&quot;listener&quot;: &quot;mouseup&quot;, &quot;name&quot;: &quot;Trefferliste Actionbutton Aktualisieren&quot;, &quot;id&quot;: &quot;1057046049&quot;, &quot;synchron&quot;: false}" data-isNeededPromise="false" data-cookieinfo="64195b36-063f-4401-8536-0d1c71a76326=1057046049"
            >Aktualisieren</a>
                            
            <a
                class="contains-icon-homepage gs-btn"
                target="_blank"
                rel=" noopener"
                href="http://www.fotoherz.de"
                data-wipe="{&quot;listener&quot;:&quot;click&quot;, &quot;name&quot;:&quot;Trefferliste Webseite-Button&quot;, &quot;id&quot;:&quot;1057046049&quot;}" data-isNeededPromise="false"
            >Webseite</a>
            
            <a
                class="contains-icon-email gs-btn"
                href="mailto:kontakt@silke-rudolph.de?subject=Anfrage%20%C3%BCber%20Gelbe%20Seiten"
                data-wipe="{&quot;listener&quot;:&quot;click&quot;, &quot;name&quot;:&quot;Trefferliste Email-Button&quot;, &quot;id&quot;:&quot;1057046049&quot;}" data-isNeededPromise="false"
            >E-Mail</a>
            
    
            <span
                class="contains-icon-route_finden gs-btn"
                data-wipe="{&quot;listener&quot;:&quot;click&quot;, &quot;name&quot;:&quot;Trefferliste Navigation-Button&quot;, &quot;id&quot;:&quot;1057046049&quot;}" data-parameters="{&quot;partner&quot;: &quot;googlemaps&quot;, &quot;searchquery&quot;: &quot;Samariterstr%2033%2010247%20Berlin&quot;}" data-target="_blank"
            >Route</span>
    
            <a
                class="contains-icon-details gs-btn"
                rel="noopener"
                href="https://www.gelbeseiten.de/gsbiz/64195b36-063f-4401-8536-0d1c71a76326"
                data-wipe="{&quot;listener&quot;: &quot;mouseup&quot;, &quot;name&quot;: &quot;Trefferliste Actionbutton Mehr Details&quot;, &quot;id&quot;: &quot;1057046049&quot;, &quot;synchron&quot;: false}" data-isNeededPromise="false" data-cookieinfo="64195b36-063f-4401-8536-0d1c71a76326=1057046049"
            >Mehr Details</a>
            
                <div class="mod-gsSlider__spacer"></div>
            </div>
        </div>
    
    </div>
                
        </article> 

我的 JS 代碼:

const puppeteer = require("puppeteer");

async function getContacts(){
    const browser = await puppeteer.launch({
        headless: false,
        defaultViewport: null
    });

    const page = await browser.newPage();
    const url = "https://www.gelbeseiten.de/Suche/Fotografen/Berlin";

        await page.goto(url);
        await page.waitFor(".mod-Treffer");

        const results = await page.$$eval(".mod-Treffer", rows => {
          return rows.map(row => {
              const properties = {};
              const firma = row.querySelector(".mod-Treffer h2");
              const tel = row.querySelector(".mod-AdresseKompakt__phoneNumber");
              const webSite = row.querySelector(" .contains-icon-homepage");
              properties.firma = firma.innerText;
              properties.tel = tel.innerText;
              properties.webSite = webSite.getAttribute("href");
              return properties;

          })
      })

      console.log(results)
}

getContacts();

在此處輸入圖像描述

我已經測試了一些.mod-Treffer元素是空的 因此,當您要查詢.contains-icon-homepage時,如果您查詢,則不會引發錯誤。 盡管如此,當您嘗試訪問內部標記 attr(即.getAttribute(); )時,並且因為 object 是null ,您會遇到異常。

解決方案:您應該測試undefined的標簽。 喜歡:

if (webSite !== undefined) {
    properties.webSite = webSite.getAttribute("href");
}

捕獲

在此處輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM