正则表达式（iPhone上的HTML解析）

Question

I am trying to pull data from a website using objective-c. 我正在尝试使用Objective-C从网站提取数据。 This is all very new to me, so I've done some research. 这对我来说是很新的，所以我做了一些研究。 What I know now is that I need to use xpath, and I have another wrapper for that called hpple for the iPhone. 我现在所知道的是，我需要使用xpath，并且我为iPhone使用了另一个包装hpple。 I've got it up and running in my project. 我已经在项目中启动它并运行它。

I am confused about the way I retrieve information from the site. 我对从网站检索信息的方式感到困惑。 Apparently I am to use regular expressions in this line of code: 显然，我要在以下代码行中使用正则表达式：

NSArray * a = [doc search:@"//a[@class='sponsor']"];

This is just an example. 这只是一个例子。 Is that stuff in the search:@"...." the regular expression? search：@“ ....”中的正则表达式吗？ If so, I guess I can develop the hundreds of patterns that I will need for my program to parse the site (I need a lot of data), but is there a better way? 如果是这样，我想我可以开发程序解析站点所需的数百种模式（我需要大量数据），但是还有更好的方法吗？ I'm very lost in this. 我对此非常迷失。 Any help is appreciated. 任何帮助表示赞赏。

Answer 1

The parameter is an XPath, not a regular expression. 该参数是XPath，而不是正则表达式。 Here's a breakdown: 这是一个细分：

All xpaths are interpreted relative to a context node . 所有xpath都是相对于上下文节点解释的。 In this case, it's the root node. 在这种情况下，它是根节点。
// is an abbreviation meaning "all descendents" //是缩写，表示“所有后代”
a means "all child nodes with a node type of 'a'" (in HTML, that's anchors ) a意思是“节点类型为'a'的所有子节点 ”（在HTML中，是anchors ）
[...] contains a predicate , refining just which a to match [...]包含了谓词，炼油只是其中a相匹配
- @ is an abbreviation for attribute nodes @是属性节点的缩写
- @class means an attribute named "class" @class表示一个名为“ class”的属性
- @class='sponsor' means a class attribute equal to "sponsor". @class='sponsor'表示等于“ sponsor”的类属性。 Note this will not match nodes with a class containing "sponsor", such as <a class="big sponsor" ...> ; 请注意，这将与包含 “ sponsor”的类（例如<a class="big sponsor" ...>节点不匹配； the class must be equal . 班级必须平等。

All together, we have "'a' nodes descending from the root that have class equal to 'sponsor'". 总之，我们有“'a'个节点，它们从根开始降级，其类等于'sponsor'”。

Answer 2

That is an XPath expression, not a regular expression. 那是一个XPath表达式，而不是正则表达式。 The W3C has an XPath reference here: http://www.w3.org/TR/xpath/ . W3C在此处具有XPath参考： http : //www.w3.org/TR/xpath/ 。 Basically you are searching for <a> elements with the class "sponsor". 基本上，您正在搜索具有“ sponsor”类的<a>元素。

Note that this is a good thing! 请注意，这是一件好事！ Regular expressions are bad for parsing HTML. 正则表达式对解析HTML不利。

正则表达式（iPhone上的HTML解析）

问题描述

2 个解决方案

解决方案1
1 已采纳 2010-10-24 16:04:18

解决方案2
0 2010-10-24 15:54:44

正则表达式（iPhone上的HTML解析）

问题描述

2 个解决方案

解决方案1 1 已采纳 2010-10-24 16:04:18

解决方案2 0 2010-10-24 15:54:44

解决方案1
1 已采纳 2010-10-24 16:04:18

解决方案2
0 2010-10-24 15:54:44