简体   繁体   English

如何使用 puppeteer 抓取视频 url?

[英]How to scrape video url using puppeteer?

I'm trying to scrape video url of Instagram videos using puppeteer but unable to do it.我正在尝试使用 puppeteer 抓取 Instagram 视频的视频 url,但无法做到。 it is returning null as a response它返回 null 作为响应

here is my code这是我的代码

async function getVideo(){
  const launch = await puppeteer.launch({headless: true});
  const page = await launch.newPage();
  await page.goto('https://www.instagram.com/p/CfW5u5UJmny/?hl=en');
  const video = await page.evaluate(() => {
      return document.querySelector('video').src;
  });

  console.log(video); returns null

  await launch.close();
}

example ur: https://instagram.fluh1-1.fna.fbcdn.net/v/t50.16885-16/290072800_730588251588660_5005285215058589375_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5pZ3R2LmJhc2VsaW5lIiwicWVfZ3JvdXBzIjoiW1wiaWdfd2ViX2RlbGl2ZXJ5X3Z0c19vdGZcIl0ifQ&_nc_ht=instagram.fluh1-1.fna.fbcdn.net&_nc_cat=100&_nc_ohc=ROJWkaOqkQcAX_z-_Ls&edm=AP_V10EBAAAA&vs=440468611258459_2442386419&_nc_vs=HBksFQAYJEdPQW9TaEUwaURaVmQ1Z0NBTC0yRkV0aVdIWkZidlZCQUFBRhUAAsgBABUAGCRHTEdvVHhGMWFjUUpsMzhDQUZNT0c1cV8wT3c1YnZWQkFBQUYVAgLIAQAoABgAGwGIB3VzZV9vaWwBMRUAACaa%2BO%2FYnLPeQBUCKAJDMywXQCDdsi0OVgQYEmRhc2hfYmFzZWxpbmVfMV92MREAdewHAA%3D%3D&ccb=7-5&oh=00_AfCBrACQlXOqmbGSWRk_6Urv_fmHJUFDIt-8w6EO0_UcHQ&oe=638D6CBD&_nc_sid=4f375e example ur: https://instagram.fluh1-1.fna.fbcdn.net/v/t50.16885-16/290072800_730588251588660_5005285215058589375_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5pZ3R2LmJhc2VsaW5lIiwicWVfZ3JvdXBzIjoiW1wiaWdfd2ViX2RlbGl2ZXJ5X3Z0c19vdGZcIl0ifQ&_nc_ht=instagram.fluh1-1.fna.fbcdn.net&_nc_cat=100&_nc_ohc=ROJWkaOqkQcAX_z -_Ls&edm=AP_V10EBAAAA&vs=440468611258459_2442386419&_nc_vs=HBksFQAYJEdPQW9TaEUwaURaVmQ1Z0NBTC0yRkV0aVdIWkZidlZCQUFBRhUAAsgBABUAGCRHTEdvVHhGMWFjUUpsMzhDQUZNT0c1cV8wT3c1YnZWQkFBQUYVAgLIAQAoABgAGwGIB3VzZV9vaWwBMRUAACaa%2BO%2FYnLPeQBUCKAJDMywXQCDdsi0OVgQYEmRhc2hfYmFzZWxpbmVfMV92MREAdewHAA%3D%3D&ccb=7-5&oh=00_AfCBrACQlXOqmbGSWRk_6Urv_fmHJUFDIt-8w6EO0_UcHQ&oe=638D6CBD&_nc_sid=4f375e

You are loading the Instagram page.您正在加载 Instagram 页面。 Since it takes a little while to load, I used setTimeout function to wait.由于加载需要一点时间,所以我使用setTimeout function等待。 Puppeteer also has many inbuilt functions you can use to obtain the src, such as the following. Puppeteer 也有许多内置函数可以用来获取 src,如下所示。

async function getVideo(){
  const launch = await puppeteer.launch({headless: false});
  const page = await launch.newPage();
  await page.goto('https://www.instagram.com/p/CfW5u5UJmny/?hl=en');
  setTimeout(async () => {
    let src = await page.$eval("video", n => n.getAttribute("src"))
    console.log(src);
    await launch.close();
  }, 1000)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM