[英]problems of time in obtaining data using puppeteer
Problem 问题
Hello dev, 您好开发者,
I have been scraping a particular page with puppeteer, particularly the video section. 我一直在用伪娘抓取特定页面,尤其是视频部分。 I have the problem that the time it takes to take the src of the video is greater than 10s. 我有一个问题,即获取视频src所需的时间大于10s。
Is not there a way to lower that amount of waiting? 有没有办法减少等待的时间?
Code 码
If you have noticed I have tried to do the request, do not refer to the font, stylesheet and images, to make it faster. 如果您发现我尝试执行此请求,请不要引用字体,样式表和图像,以使其更快。
But still the waiting time exceeds 10s 但是等待时间仍然超过10s
const getAnimeVideo = async (id: string, chapter: number) => {
const BASE_URL = `${url}${id}/${chapter}/`;
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36');
await page.setRequestInterception(true);
page.on('request', (req) => {
if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
req.abort();
}
else{
req.continue();
}
});
await page.goto(BASE_URL);
await page.waitFor(10000);
const elementHandle = await page.waitForSelector('iframe.player_conte');
const frame = await elementHandle.contentFrame();
const video = await frame.$eval('#jkvideo_html5_api', el =>
Array.from(el.getElementsByTagName('source')).map(e => e.getAttribute("src")));
await page.close();
await browser.close();
return video;
}
Solution using cheerio 使用cheerio的解决方案
async function getVideoURL(url: string) {
// This requests the underlying iframe page
const { data } = await axios.get(url);
const $ = cheerio.load(data);
const video = $('video');
if (video.length) {
// Sometimes the video is directly embedded
const src = $(video).find('source').attr('src');
return src;
} else {
// If the video is not embedded, there is obfuscated code that will create a video element
// Here we run the code to get the underlying video url
const scripts = $('script');
// The obfuscated code uses a variable called l which is the window / global object
const l = global;
// The obfuscated code uses a variable called ll which is String
const ll = String;
const $script2 = $(scripts[1]).html();
// Kind of dangerous, but the code is very obfuscated so its hard to tell how it decrypts the URL
eval($script2);
// The code above sets a variable called ss that is the mp4 URL
return (l as any).ss;
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.