Extracting variable from Javascript script block

Question

I'm new to Javascript and trying to parse through it using Python but i've been giving it a go using BeautifulSoup along with Requests to extract the 'file' line out of the 'RT.currentVideo' section of this script, but i can't seem to. I'm completly lost as to how i'd even be able to store this section of the webpage as it doesn't have an identifier like most other questions related to this i've found online.

Any help would really be appreciated, thanks for taking the time to check in!

This is what i've been using to read the page:

url = "http://roosterteeth.com/episode/rt-docs-connected-connected-official-trailer"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0', 'Accept-Encoding': 'utf-8'})
response = urlopen(req)
webpage = BeautifulSoup(response.read().decode('utf-8', 'ignore'), "html.parser")

And this is the Javascript block on the page i want to extract info from. Again, what i'm looking to get is the string in the 'file' variable.

<script>
    RT.currentVideo = {
      authUser: 0,
      autoPlay: 1,
      csrfToken: 'H240Yw8x9oYasUw2Tzt3qpwzA14Z1ajRjuXo6RV1',
      endPoint: 89,
      desktopAgent: 1,
      file: 'https://rtv2-video.roosterteeth.com/uploads/videos/0e840b4f-a188-440d-adc0-b78093c1009f/index.m3u8',

Answer 1

You can use regex to extract that from the page html.

import re
regex = r"file:\s*?'(.+)'"

matches = re.findall(regex, webpageHtmlString)

print(matches[0])

webpageHtmlString should be the html of the page as string.

Answer 2

Use PyQuery to get jquery like querying on html content using python.

from pyquery import PyQuery as pq

scripttags = pq('src') ## will output a list of script tags

print(scriptTags[0].src)

Based on your content you can use Jquery like querying

Extracting variable from Javascript script block

Question

2 answers

solution1
2 ACCPTED 2018-01-16 08:25:40

solution2
0 2018-01-16 08:45:36

Extracting variable from Javascript script block

Question

2 answers

solution1 2 ACCPTED 2018-01-16 08:25:40

solution2 0 2018-01-16 08:45:36

solution1
2 ACCPTED 2018-01-16 08:25:40

solution2
0 2018-01-16 08:45:36