简体   繁体   English

使用XML包在R中的XPath

[英]XPath within R using XML package

I am new to XPath, but I can see how powerful it is. 我是XPath的新手,但我可以看到它有多强大。 I am looking at the source code of this link and simply want to extract the contents and username from the following two pieces of the page, which for simplicity sake are located near the top of the source code. 我正在查看此链接的源代码,只是想从以下两个页面中提取内容和用户名,为简单起见,它们位于源代码顶部附近。

content="[Archive] Simburgur's Live Stream [Offline] Gears of War 3" content =“[存档] Simburgur的直播[离线]战争机器3”

<div class="username">Simburgur</div>

Here is my code within R: 这是我在R中的代码:

doc <- htmlParse("http://forums.epicgames.com/archive/index.php/t-672775.html")
xpathSApply(doc, "//head/meta[@name=\"description\"]")

which returns 返回

[[1]]
<meta name="description" content="[Archive]  Simburgur's Live Stream [Offline] Gears of War 3" /> 

Obviously, in this example, all I want is what is inside the quotes of content= but am stuck and can not seem to get my expression to return the string I want. 显然,在这个例子中,我想要的只是内容引用内的内容=但是卡住了,似乎无法让我的表达式返回我想要的字符串。

I repeat. 我重复。 I am new to XPath. 我是XPath的新手。 :) :)

Use : 用途

/*/head/meta[@name='description']/@content

This still selects an attribute node , but probably there is an easy way in your PL to get the string value of the attribute. 这仍然会选择一个属性节点 ,但是在PL中可能有一种简单的方法来获取属性的字符串值。

To get just the string value, use : 要获得字符串值,请使用

string(/*/head/meta[@name='description']/@content)

Do note : Using the // abbreviation may result in very slow evaluation of the XPath expression, because it may cause a linear traversal of a whole (sub)tree. 请注意 :使用//缩写可能会导致对XPath表达式的评估非常慢,因为它可能导致整个(子)树的线性遍历。

Always avoid using // if the structure of the XML document is statically known . 如果静态知道XML文档的结构,请始终避免使用//

You're close. 你很亲密 This should do it. 这应该做到这一点。

//head/meta[@name=\"description\"]/@content

The brackets are constraining the choice of meta tags, but you still have to specify the attribute you want. 括号限制了元标记的选择,但您仍然必须指定所需的属性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM