简体繁体 English

如何使用Perl创建网站摘要？

[英]How can I create a website summary with Perl?

原文 2009-08-14 19:52:12 1 4 perl

When you share something on Facebook or Digg, it generates some summary of the page. 当您在Facebook或Digg上共享内容时，它会生成页面的一些摘要。 How would I do this in Perl? 我将如何在Perl中做到这一点？ What algorithms are there? 有什么算法？

For example: 例如：

If I go to Facebook and tried to share this question as a link: How can I create a website summary with Perl? 如果我去Facebook并尝试通过链接分享此问题：如何使用Perl创建网站摘要？

It retrieves "Facebook/Digg get website summary? - Stack Overflow" as the title (which is just the title of the page) and [... incomplete question?] 它检索“ Facebook / Digg获取网站摘要？-堆栈溢出”作为标题（仅是页面标题）和[...不完整的问题？]

4 个解决方案

CPAN is your friend. CPAN是您的朋友。

Some promising looking modules: 一些有前途的模块：

HTML::Summary HTML ::摘要
HTML::SummaryBasic HTML :: SummaryBasic
Lingua::EN::Summarize Lingua :: EN ::总结

Assuming you mean sharing a link... 假设您的意思是共享链接...

Usually the summary is written by the user submitting the URL. 通常，摘要是由提交URL的用户编写的。 If you have to write a summary automagically this can be achieved by: 如果您必须自动编写摘要，可以通过以下方法实现：

Using the first 100 or so characters of the document body (in itself not easy) 使用文档正文的前100个左右的字符（本身并不容易）
Using metadata like the description or keywords (often empty or spammed) 使用描述或关键字之类的元数据（通常为空或垃圾邮件）
Context-relevant summaries like recreating Google snippets (sorry its PHP but simple) 与上下文相关的摘要，例如重新创建Google代码段（对不起PHP，但很简单）
Tags/keywords from the document using something like the Yahoo Keyword Extractor API or your own keyword density function 使用Yahoo Keyword Extractor API或您自己的关键字密度函数之类的文档中的标签/关键字