简体   繁体   English

使用操纵up获取数据的时间问题

[英]problems of time in obtaining data using puppeteer

Problem 问题

Hello dev, 您好开发者,

I have been scraping a particular page with puppeteer, particularly the video section. 我一直在用伪娘抓取特定页面,尤其是视频部分。 I have the problem that the time it takes to take the src of the video is greater than 10s. 我有一个问题,即获取视频src所需的时间大于10s。

Is not there a way to lower that amount of waiting? 有没有办法减少等待的时间?

等待TTFB

Code

If you have noticed I have tried to do the request, do not refer to the font, stylesheet and images, to make it faster. 如果您发现我尝试执行此请求,请不要引用字体,样式表和图像,以使其更快。

But still the waiting time exceeds 10s 但是等待时间仍然超过10s

const getAnimeVideo = async (id: string, chapter: number) => {
  const BASE_URL = `${url}${id}/${chapter}/`;
  const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
  const page = await browser.newPage();
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36');
  await page.setRequestInterception(true);


  page.on('request', (req) => {
    if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
      req.abort();
    }
    else{
      req.continue();
    }
  });

  await page.goto(BASE_URL);
  await page.waitFor(10000);
  const elementHandle = await page.waitForSelector('iframe.player_conte');
  const frame = await elementHandle.contentFrame();
  const video = await frame.$eval('#jkvideo_html5_api', el =>
    Array.from(el.getElementsByTagName('source')).map(e => e.getAttribute("src")));
  await page.close();
  await browser.close();
  return video;
}

Solution using cheerio 使用cheerio的解决方案

async function getVideoURL(url: string) {
  // This requests the underlying iframe page
  const { data } = await axios.get(url);
  const $ = cheerio.load(data);
  const video = $('video');
  if (video.length) {
    // Sometimes the video is directly embedded
    const src = $(video).find('source').attr('src');
    return src;
  } else {
    // If the video is not embedded, there is obfuscated code that will create a video element
    // Here we run the code to get the underlying video url
    const scripts = $('script');
    // The obfuscated code uses a variable called l which is the window / global object
    const l = global;
    // The obfuscated code uses a variable called ll which is String
    const ll = String;
    const $script2 = $(scripts[1]).html();
    // Kind of dangerous, but the code is very obfuscated so its hard to tell how it decrypts the URL
    eval($script2);
    // The code above sets a variable called ss that is the mp4 URL
    return (l as any).ss;
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM