简体   繁体   English

HTML解析:如何从远程站点获取链接标记

[英]HTML parsing: how to get link tag from remote site

I have a site (for example apple.com ) which contain link tag, for example 我有一个网站(例如apple.com ),其中包含链接标记,例如

<link rel="alternate" type="application/rss+xml" title="RSS" href="http://images.apple.com/main/rss/hotnews/hotnews.rss" />

So how I can get title "RSS" and href from it? 那么,如何获得标题“ RSS”和href?

Update 1: I've tried to convert site into string using 更新1:我试图使用将网站转换为字符串

NSData *data = [NSURLConnection sendSynchronousRequest:[NSURLRequest requestWithURL:[NSURL URLWithString:@"http://apple.com/"]] returningResponse:NULL error:NULL];
NSString *HTMLWithFeeds = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];

But I dont know what to do now... 但是我不知道现在该怎么办...

Update 2: 更新2:

It is not clear from my post, but in addition in should find at this site link with type="application/rss+xml" 从我的帖子中还不清楚,但除此之外,应该在此站点上找到type =“ application / rss + xml”的链接

you might try using regular expressions 您可以尝试使用正则表达式

NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"<link.*?href="(.*?)".*?>"
                                                                           options:NSRegularExpressionCaseInsensitive
                                                                             error:&error];

NSArray *matches = [regex matchesInString:string
                                  options:0
                                    range:NSMakeRange(0, [string length])];
for (NSTextCheckingResult *match in matches) {
     NSRange matchRange = [match range];
     NSRange firstHalfRange = [match rangeAtIndex:1];
     NSRange secondHalfRange = [match rangeAtIndex:2];
}

Apples documentation has some examples about how to further use and access the matches: 苹果的文档提供了一些有关如何进一步使用和访问匹配项的示例:

https://developer.apple.com/library/ios/#documentation/Foundation/Reference/NSRegularExpression_Class/Reference/Reference.html https://developer.apple.com/library/ios/#documentation/Foundation/Reference/NSRegularExpression_Class/Reference/Reference.html

eg something like the following regex should do for the hrefs: 例如,类似以下正则表达式的东西应该对href起作用:

<link.*?href="(.*?)".*?>

if you using jquery, $("link").attr("title") --> print "RSS" $("link").attr("href") --> print http:// * 如果使用jquery,则$(“ link”)。attr(“ title”)->打印“ RSS” $(“ link”)。attr(“ href”)->打印http:// *

if you want to get the href content, using jquery ajax. 如果要获取href内容,请使用jquery ajax。 $.get("http:// * ", function(result){}); $ .get(“ http:// * ”,function(result){});

Create an NSXMLDocument using -initWithContentsOfURL:options:error: with the NSXMLDocumentTidyHTML option. 使用NSXMLDocument -initWithContentsOfURL:options:error:NSXMLDocumentTidyHTML选项创建NSXMLDocumentTidyHTML Then, you can navigate the hierarchy of nodes starting with -rootElement . 然后,您可以导航以-rootElement开头的节点的层次结构。 Or, you can use XPath, like [doc nodesForXPath:@"//link@title"] . 或者,您可以使用XPath,例如[doc nodesForXPath:@"//link@title"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM