简体   繁体   English

PHP RSS Feed Crawler

[英]PHP RSS Feed Crawler

I wanna build a RSS Feed Crawler for my website. 我想为我的网站构建一个RSS Feed Crawler。 Though im not quite sure, how to begin this. 虽然我不太确定,如何开始这个。 How can my Crawler identify the RSS feed? 我的Crawler如何识别RSS提要? Is there any thing I can crawl for, which every RSS reader has? 有什么东西我可以抓取,每个RSS阅读器都有? I don't need any code, just some help for my brain to understand what I have to create. 我不需要任何代码,只需要帮助我的大脑了解我必须创建的内容。

Thanks in before! 谢谢之前!

Greetings 问候

Xatenev Xatenev

I think it would be possible if your crawler scans all links and opens each page at least one time to look for the text <rss version="2.0"> . 我认为,如果您的抓取工具扫描所有链接并打开每个页面至少一次以查找文本<rss version="2.0"> From what I understand, every RSS feed should contain this line. 根据我的理解,每个RSS提要都应包含此行。

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
 <title>RSS Title</title>
 <description>This is an example of an RSS feed</description>
 <link>http://www.someexamplerssdomain.com/main.html</link>
 <lastBuildDate>Mon, 06 Sep 2010 00:01:00 +0000 </lastBuildDate>
 <pubDate>Mon, 06 Sep 2009 16:20:00 +0000 </pubDate>
 <ttl>1800</ttl>

 <item>
  <title>Example entry</title>
  <description>Here is some text containing an interesting description.</description>
  <link>http://www.wikipedia.org/</link>
  <guid>unique string per item</guid>
  <pubDate>Mon, 06 Sep 2009 16:20:00 +0000 </pubDate>
 </item>

</channel>
</rss>

If you're going to use PHP, I have very positive experiences with SimpleXML which is built in PHP. 如果您打算使用PHP,我对使用PHP构建的SimpleXML有非常积极的体验。

PS Xatenev you're welcome ;) PS Xatenev欢迎你;)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM