简体   繁体   English

使用Nokogiri解析XML

[英]Parse XML with Nokogiri

Having some issues getting the proper setup for Nokogiri and their documentation is a little rough to get started with. 有一些问题得到Nokogiri及其文档的正确设置有点粗略开始。

I am trying to parse the XML file: http://www.kongregate.com/games_for_your_site.xml 我正在尝试解析XML文件: http//www.kongregate.com/games_for_your_site.xml

Which returns multiple games inside the gameset, and for each game it has a title, desc, etc.... 它返回游戏内的多个游戏,每个游戏都有一个标题,desc等....

<gameset>
  <game>
    <id>160342</id>
    <title>Tricky Rick</title>
    <thumbnail>
      http://cdn3.kongregate.com/game_icons/0042/7180/KONG_icon250x200_site.png?21656-op
    </thumbnail>
    <launch_date>2012-12-12</launch_date>
    <category>Puzzle</category>
    <flash_file>
      http://external.kongregate-games.com/gamez/0016/0342/live/embeddable_160342.swf
    </flash_file>
    <width>640</width>
    <height>480</height>
    <url>
      http://www.kongregate.com/games/tAMAS_Games/tricky-rick
    </url>
    <description>
      Help Rick to collect all the stolen fuel to refuel his spaceship and fly away from the planet. Use hammer, bombs, jetpack and other useful stuff to solve puzzles!
    </description>
    <instructions>
      WASD \ Arrow Keys &#8211; move; S \ Down Arrow &#8211; take\release an object; CNTRL &#8211; interaction with objects: throw, hammer strike, invisibility mode; SPACE &#8211; interaction with elevators and fuel stations; Esc \ P &#8211; pause;
    </instructions>
    <developer_name>tAMAS_Games</developer_name>
    <gameplays>24999</gameplays>
    <rating>3.43</rating>
  </game>
  <game>
    <id>160758</id>
    <title>Flying Cookie Quest</title>
    <thumbnail>
      http://cdn2.kongregate.com/game_icons/0042/8428/icon_cookiequest_kong_250x200_site.png?16578-op
    </thumbnail>
    <launch_date>2012-12-07</launch_date>
    <category>Action</category>
    <flash_file>
      http://external.kongregate-games.com/gamez/0016/0758/live/embeddable_160758.swf
    </flash_file>
    <width>640</width>
    <height>480</height>
    <url>
      http://www.kongregate.com/games/LongAnimals/flying-cookie-quest
    </url>
    <description>
      Launch Rocket Panda into the land of Cookies. With the help of low-flying sharks, hang-gliding sheep and Rocket Badger, can you defeat the all powerful Biscuit Head? Defeat All enemies of cookies in this launcher game.
    </description>
    <instructions>Use the mouse button!</instructions>
    <developer_name>LongAnimals</developer_name>
    <gameplays>168672</gameplays>
    <rating>3.67</rating>
  </game>

From the documentation, I am using something like: 从文档中,我使用的是:

require 'nokogiri'
require 'open-uri'

url = "http://www.kongregate.com/games_for_your_site.xml"
xml = Nokogiri::XML(open(url))
xml.xpath("//game").each do |node|
    puts node.xpath("//id")
    puts node.xpath("//title")
    puts node.xpath("//thumbnail")
    puts node.xpath("//category")
    puts node.xpath("//flash_file")
    puts node.xpath("//width")
    puts node.xpath("//height")
    puts node.xpath("//description")
    puts node.xpath("//instructions")
end

But, it just returns endless data, and not in a set. 但是,它只返回无穷无尽的数据,而不是一组。 Any help would be helpful. 任何帮助都会有所帮助。

Here's how I'd rewrite your code: 以下是我重写代码的方法:

xml = Nokogiri::XML(open("http://www.kongregate.com/games_for_your_site.xml"))
xml.xpath("//game").each do |game|
  %w[id title thumbnail category flash_file width height description instructions].each do |n|
    puts game.at(n)
  end
end

The problem in your code is that all the sub-tags are prefixed with // which, in XPath-ese, means, "start at the root node and search downwards for all tags containing that text." 代码中的问题是所有子标签都以//为前缀,在XPath-ese中,“从根节点开始向下搜索包含该文本的所有标签。” So, instead of only searching inside each of the //game nodes, it searched the entire document for each of the listed tags for each //game node. 因此,它不是仅在每个//game节点内搜索,而是在整个文档中搜索每个//game节点列出的每个标签。

I recommend using CSS accessors over XPath, because they are simpler (usually) and easier to read as a result. 我建议在XPath上使用CSS访问器,因为它们更简单(通常)并且更容易阅读。 So, instead of xpath('//game') I use search('game') . 因此,我使用search('game')代替xpath('//game') search('game') ( search will take a CSS or XPath accessor, as will at .) search将使用CSS或XPath访问,如意志at 。)

If you want the text contained in the tags, change puts game.at(n) to: 如果您想要标签中包含的文本, puts game.at(n)更改为:

puts game.at(n).text

To make the output more useful I'd do this: 为了使输出更有用,我会这样做:

require 'nokogiri'
require 'open-uri'

xml = Nokogiri::XML(open('http://www.kongregate.com/games_for_your_site.xml'))
games = xml.search('game').map do |game|
  %w[
    id title thumbnail category flash_file width height description instructions
  ].each_with_object({}) do |n, o|
    o[n] = game.at(n).text
  end
end

require 'awesome_print'
puts games.size
ap games.first
ap games.last

Which results in: 结果如下:

395
{
              "id" => "160342",
          "title"  => "Tricky Rick",
      "thumbnail"  => "http://cdn3.kongregate.com/game_icons/0042/7180/KONG_icon250x200_site.png?21656-op",
        "category" => "Puzzle",
      "flash_file" => "http://external.kongregate-games.com/gamez/0016/0342/live/embeddable_160342.swf",
          "width"  => "640",
          "height" => "480",
    "description"  => "Help Rick to collect all the stolen fuel to refuel his spaceship and fly away from the planet. Use hammer, bombs, jetpack and other useful stuff to solve puzzles!\n",
    "instructions" => "WASD \\ Arrow Keys &#8211; move;\nS \\ Down Arrow &#8211; take\\release an object;\nCNTRL &#8211; interaction with objects: throw, hammer strike, invisibility mode;\nSPACE &#8211; interaction with elevators and fuel stations;\nEsc \\ P &#8211; pause;\n"
}
{
              "id" => "78",
          "title"  => "rotaZion",
      "thumbnail"  => "http://cdn2.kongregate.com/game_icons/0000/0115/pixtiz.rotazion_icon.jpg?8217-op",
        "category" => "Action",
      "flash_file" => "http://external.kongregate-games.com/gamez/0000/0078/live/embeddable_78.swf",
          "width"  => "350",
          "height" => "350",
    "description"  => "In rotaZion, you play with a bubble bar that you can&#8217;t stop rotating !\nCollect the bubbles and try to avoid the mines !\nCollect the different bonus to protect your bubble bar, makes the mines go slower or destroy all the mines !\nTry to beat 100.000 points ;)\n",
    "instructions" => "Move the bubble bar with the arrow keys !\nBubble = 500 Points !\nPixtiz sign = 5000 Points !\n"
}

You can try something like this. 你可以尝试这样的事情。 I would suggest creating an array for the elements inside of game that you want and then iterate over them. 我建议为你想要的游戏内部元素创建一个数组,然后迭代它们。 I'm sure there's a way to get all of the elements inside the specified one in Nokogiri but this works: 我确信有一种方法可以将所有元素都放在Nokogiri中指定的元素中但是这样可行:

   xml = Nokogiri::XML(result)
    xml.css("game").each do |inv|
      inv.css("title").each do |f|  # title or whatever else you want
        puts f.inner_html
      end
    end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM