简体   繁体   中英

Extracting variable from Javascript script block

I'm new to Javascript and trying to parse through it using Python but i've been giving it a go using BeautifulSoup along with Requests to extract the 'file' line out of the 'RT.currentVideo' section of this script, but i can't seem to. I'm completly lost as to how i'd even be able to store this section of the webpage as it doesn't have an identifier like most other questions related to this i've found online.

Any help would really be appreciated, thanks for taking the time to check in!

This is what i've been using to read the page:

url = "http://roosterteeth.com/episode/rt-docs-connected-connected-official-trailer"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0', 'Accept-Encoding': 'utf-8'})
response = urlopen(req)
webpage = BeautifulSoup(response.read().decode('utf-8', 'ignore'), "html.parser")

And this is the Javascript block on the page i want to extract info from. Again, what i'm looking to get is the string in the 'file' variable.

<script>
    RT.currentVideo = {
      authUser: 0,
      autoPlay: 1,
      csrfToken: 'H240Yw8x9oYasUw2Tzt3qpwzA14Z1ajRjuXo6RV1',
      endPoint: 89,
      desktopAgent: 1,
      file: 'https://rtv2-video.roosterteeth.com/uploads/videos/0e840b4f-a188-440d-adc0-b78093c1009f/index.m3u8',

You can use regex to extract that from the page html.

import re
regex = r"file:\s*?'(.+)'"

matches = re.findall(regex, webpageHtmlString)

print(matches[0])

webpageHtmlString should be the html of the page as string.

Use PyQuery to get jquery like querying on html content using python.

from pyquery import PyQuery as pq

scripttags = pq('src') ## will output a list of script tags

print(scriptTags[0].src)

Based on your content you can use Jquery like querying

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM