[英]How do I get whole html from Apify Cheerio crawler?
I want to get the whole html not just text.我想获得整个 html 而不仅仅是文本。
Apify.main(async () => {
const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest({
url: //adress,
uniqueKey: makeid(100)
}); });
const handlePageFunction = async ({ request, $ }) => {
var content_to = $('.class')
};
// Set up the crawler, passing a single options object as an argument.
const crawler = new Apify.CheerioCrawler({
requestQueue,
handlePageFunction,
});
await crawler.run();
}); });
When I try this the crawler returns complex object.当我尝试这个时,爬虫返回复杂的 object。 I know I can extract the text from the content_to variable using.text() but I need the whole html with tags like.我知道我可以使用.text() 从 content_to 变量中提取文本,但我需要带有类似标签的整个 html。 What should I do?我应该怎么办?
If I understand you correctly - you could just use .html()
instead of .text()
.如果我理解正确 - 你可以只使用.html()
而不是.text()
。 This way you will get inner html instead of inner text of the element.这样,您将获得内部 html 而不是元素的内部文本。
Another thing to mention - you could also put body
to handlePageFunction
arg object: const handlePageFunction = async ({ request, body, $ }) => {
另一件要提的事情 - 你也可以把body
放到handlePageFunction
arg object: const handlePageFunction = async ({ request, body, $ }) => {
body
would have the whole raw html of the page. body
将包含页面的整个原始 html。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.