简体   繁体   English

从检索到的页面的JavaScript解析数据

[英]Parse data from JavaScript of retrieved page

I'm retrieving a web page with OpenURI: 我正在使用OpenURI检索网页:

require 'open-uri'
page = open('http://www.example.com').read.scrub

Now I'd like to parse the values of the attributes playerurl , playerdata and pageurl of the retrieved page. 现在,我想解析检索到的页面的属性playerurlplayerdatapageurl的值。 They appear in a <script> tag: 它们出现在<script>标记中:

<script>
..
..
  PlayerWatchdog.init({
      'playerurl': 'http://cdn.static.de/now/player.swf?ts=2011354353',
      'playerdata': 'http://www.example.com/player',
      'pageurl': 'http://www.example.com?test=2',
      });
..
..
</script>

What's the smartest way to accomplish this? 什么是最明智的方式来做到这一点?

You can use an HTML parser, such as Nokogiri , to take apart the HTML document, and quickly find the <script> tag you're after. 您可以使用HTML解析器(例如Nokogiri )来分解HTML文档,并快速找到所需的<script>标记。 The content inside a <script> tag is text, so Nokogiri's text method will return that. <script>标记内的内容是文本,因此Nokogiri的text方法将返回该text Then it's a matter of selectively retrieving the lines you want, which can be done by a simple regular expression: 然后,可以有选择地检索所需的行,可以通过一个简单的正则表达式来完成:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
  <head>
    <script>
      PlayerWatchdog.init({
          'playerurl': 'http://cdn.static.de/now/player.swf?ts=2011354353',
          'playerdata': 'http://www.example.com/player',
          'pageurl': 'http://www.example.com?test=2',
          });
    </script>
  </head>
</html>
EOT

script_text = doc.at('script').text 
playerurl, playerdata, pageurl = %w[
  playerurl
  playerdata
  pageurl
].map{ |i| script_text[/'#{ i }': '([^']+')/, 1] }

playerurl # => "http://cdn.static.de/now/player.swf?ts=2011354353'"
playerdata # => "http://www.example.com/player'"
pageurl # => "http://www.example.com?test=2'"

at returns the first matching <script> Node instance. at返回第一个匹配的<script> Node实例。 Depending on the HTML you might not want the first matching <script> . 根据HTML,您可能不需要第一个匹配的<script> You can use search instead, which will return a NodeSet , similar to an array of Nodes, and then grab a particular element from the NodeSet, or, instead of using a CSS selector, you can use XPath which will let you easily specify a particular occurrence of the tag desired. 您可以改用search ,它会返回NodeSet ,类似于Nodes的数组,然后从NodeSet中获取特定的元素,或者可以使用XPath代替CSS选择器,从而轻松地指定特定的所需标签的出现。

Once the tag is found, text returns its contents, and the task moves from Nokogiri to using a pattern to find what is desired. 找到标签后, text将返回其内容,任务将从Nokogiri转到使用模式来查找所需内容。 /'#{ i }': '([^']+')/ is a simple pattern that looks for a word, passed in in i followed by : ' then capture everything up to the next ' . /'#{ i }': '([^']+')/是寻找单词的简单模式,在i传入,后跟: '然后捕获所有内容,直到下一个' That pattern is passed to String's [] method. 该模式将传递给String的[]方法。

Ruby has no built-in javascript parsing capabilities. Ruby没有内置的javascript解析功能。 You can use a regexp, though this will be rather sensitive to the formatting of the page (for example this will break if the page starts using double quotes for strings): 您可以使用正则表达式,尽管它对页面的格式非常敏感(例如,如果页面开始使用双引号来表示字符串,则该表达式会中断):

playerurl = page[/'playerurl':\s*'([^']*)'/, 1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我如何解析从javascript或jQuery中的PHP脚本检索到的json数据 - How do i parse json data retrieved from php script in javascript or jquery 如何从 Google 表格中检索数据(URL)并在 javascript 代码中使用它来从页面重定向到检索到的 URL? - How to retrieve data (URLs) from Google Sheets and use it in a javascript code for redirecting from a page to the retrieved URL? 检测是否从Java中的Firebase中检索数据 - Detect if data is retrieved from Firebase in Javascript 使用从JavaScript检索数据库中的数据 - Use data retrieved from database with javascript 使用Javascript解析XML并从检索的值中创建对象 - Using Javascript To Parse XML and Create Objects From The Retrieved Values 无法使用从Parse.com检索到的数据构建DataTable - Unable to construct DataTable using data retrieved from Parse.com 如何将从mysql检索到的数据传递到JavaScript中以在画布上绘制点 - How to pass data retrieved from mysql into a javascript for ploting dots on canvas 使用JavaScript将从API检索到的数据写入主机服务器 - Write data retrieved from an API using JavaScript to host server 如何通过 html 显示从 JavaScript 检索到的数据 - How to display retrieved data from JavaScript through html 从数据库中检索数据时如何停止javascript加载器 - How to stop the javascript loader when the data retrieved from the database
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM