如何從 Apify Cheerio 爬蟲獲取整個 html？

Question

我想獲得整個 html 而不僅僅是文本。

Apify.main(async () => {


const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest({ 
    url: //adress,
    uniqueKey: makeid(100)

});

const handlePageFunction = async ({ request, $ }) => {
    var content_to = $('.class')

    
};

// Set up the crawler, passing a single options object as an argument.
const crawler = new Apify.CheerioCrawler({
    requestQueue,
    handlePageFunction,
});

await crawler.run();

});

當我嘗試這個時，爬蟲返回復雜的 object。 我知道我可以使用.text() 從 content_to 變量中提取文本，但我需要帶有類似標簽的整個 html。 我應該怎么辦？

Answer 1

如果我理解正確 - 你可以只使用.html()而不是.text() 。 這樣，您將獲得內部 html 而不是元素的內部文本。

另一件要提的事情 - 你也可以把body放到handlePageFunction arg object: const handlePageFunction = async ({ request, body, $ }) => {

body將包含頁面的整個原始 html。

如何從 Apify Cheerio 爬蟲獲取整個 html？

問題描述

1 個解決方案

解決方案1
1 已采納 2020-12-25 14:40:47

如何從 Apify Cheerio 爬蟲獲取整個 html？

問題描述

1 個解決方案

解決方案1 1 已采納 2020-12-25 14:40:47

解決方案1
1 已采納 2020-12-25 14:40:47