简体   繁体   中英

How to scrape video url using puppeteer?

I'm trying to scrape video url of Instagram videos using puppeteer but unable to do it. it is returning null as a response

here is my code

async function getVideo(){
  const launch = await puppeteer.launch({headless: true});
  const page = await launch.newPage();
  await page.goto('https://www.instagram.com/p/CfW5u5UJmny/?hl=en');
  const video = await page.evaluate(() => {
      return document.querySelector('video').src;
  });

  console.log(video); returns null

  await launch.close();
}

example ur: https://instagram.fluh1-1.fna.fbcdn.net/v/t50.16885-16/290072800_730588251588660_5005285215058589375_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5pZ3R2LmJhc2VsaW5lIiwicWVfZ3JvdXBzIjoiW1wiaWdfd2ViX2RlbGl2ZXJ5X3Z0c19vdGZcIl0ifQ&_nc_ht=instagram.fluh1-1.fna.fbcdn.net&_nc_cat=100&_nc_ohc=ROJWkaOqkQcAX_z-_Ls&edm=AP_V10EBAAAA&vs=440468611258459_2442386419&_nc_vs=HBksFQAYJEdPQW9TaEUwaURaVmQ1Z0NBTC0yRkV0aVdIWkZidlZCQUFBRhUAAsgBABUAGCRHTEdvVHhGMWFjUUpsMzhDQUZNT0c1cV8wT3c1YnZWQkFBQUYVAgLIAQAoABgAGwGIB3VzZV9vaWwBMRUAACaa%2BO%2FYnLPeQBUCKAJDMywXQCDdsi0OVgQYEmRhc2hfYmFzZWxpbmVfMV92MREAdewHAA%3D%3D&ccb=7-5&oh=00_AfCBrACQlXOqmbGSWRk_6Urv_fmHJUFDIt-8w6EO0_UcHQ&oe=638D6CBD&_nc_sid=4f375e

You are loading the Instagram page. Since it takes a little while to load, I used setTimeout function to wait. Puppeteer also has many inbuilt functions you can use to obtain the src, such as the following.

async function getVideo(){
  const launch = await puppeteer.launch({headless: false});
  const page = await launch.newPage();
  await page.goto('https://www.instagram.com/p/CfW5u5UJmny/?hl=en');
  setTimeout(async () => {
    let src = await page.$eval("video", n => n.getAttribute("src"))
    console.log(src);
    await launch.close();
  }, 1000)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM