How do I get whole html from Apify Cheerio crawler?

Question

I want to get the whole html not just text.

Apify.main(async () => {


const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest({ 
    url: //adress,
    uniqueKey: makeid(100)

});

const handlePageFunction = async ({ request, $ }) => {
    var content_to = $('.class')

    
};

// Set up the crawler, passing a single options object as an argument.
const crawler = new Apify.CheerioCrawler({
    requestQueue,
    handlePageFunction,
});

await crawler.run();

});

When I try this the crawler returns complex object. I know I can extract the text from the content_to variable using.text() but I need the whole html with tags like. What should I do?

Answer 1

If I understand you correctly - you could just use .html() instead of .text() . This way you will get inner html instead of inner text of the element.

Another thing to mention - you could also put body to handlePageFunction arg object: const handlePageFunction = async ({ request, body, $ }) => {

body would have the whole raw html of the page.

How do I get whole html from Apify Cheerio crawler?

Question

1 answers

solution1
1 ACCPTED 2020-12-25 14:40:47

How do I get whole html from Apify Cheerio crawler?

Question

1 answers

solution1 1 ACCPTED 2020-12-25 14:40:47

solution1
1 ACCPTED 2020-12-25 14:40:47