简体   繁体   English

从Javascript脚本块中提取变量

[英]Extracting variable from Javascript script block

I'm new to Javascript and trying to parse through it using Python but i've been giving it a go using BeautifulSoup along with Requests to extract the 'file' line out of the 'RT.currentVideo' section of this script, but i can't seem to. 我是Javascript的新手,并尝试使用Python解析它,但我一直在使用BeautifulSoup以及请求从这个脚本的'RT.currentVideo'部分提取'文件'行,但是我似乎无法。 I'm completly lost as to how i'd even be able to store this section of the webpage as it doesn't have an identifier like most other questions related to this i've found online. 我完全不知道如何能够存储网页的这一部分,因为它没有像我在网上发现的大多数其他相关问题的标识符。

Any help would really be appreciated, thanks for taking the time to check in! 非常感谢任何帮助,感谢您抽出宝贵时间办理登机手续!

This is what i've been using to read the page: 这是我一直用来阅读页面的内容:

url = "http://roosterteeth.com/episode/rt-docs-connected-connected-official-trailer"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0', 'Accept-Encoding': 'utf-8'})
response = urlopen(req)
webpage = BeautifulSoup(response.read().decode('utf-8', 'ignore'), "html.parser")

And this is the Javascript block on the page i want to extract info from. 这是我要从中提取信息的页面上的Javascript块。 Again, what i'm looking to get is the string in the 'file' variable. 同样,我想要得到的是'file'变量中的字符串。

<script>
    RT.currentVideo = {
      authUser: 0,
      autoPlay: 1,
      csrfToken: 'H240Yw8x9oYasUw2Tzt3qpwzA14Z1ajRjuXo6RV1',
      endPoint: 89,
      desktopAgent: 1,
      file: 'https://rtv2-video.roosterteeth.com/uploads/videos/0e840b4f-a188-440d-adc0-b78093c1009f/index.m3u8',

You can use regex to extract that from the page html. 您可以使用正则表达式从页面html中提取它。

import re
regex = r"file:\s*?'(.+)'"

matches = re.findall(regex, webpageHtmlString)

print(matches[0])

webpageHtmlString should be the html of the page as string. webpageHtmlString应该是页面的html作为字符串。

Use PyQuery to get jquery like querying on html content using python. 使用PyQuery获取jquery就像使用python查询html内容一样。

from pyquery import PyQuery as pq

scripttags = pq('src') ## will output a list of script tags

print(scriptTags[0].src)

Based on your content you can use Jquery like querying 根据您的内容,您可以像查询一样使用Jquery

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM