[英]C# extract content from HTML document
I was wondering how can I do something similar to Facebook when a link is posted or like shortening link services that can get the title of the page and its content. 我想知道如何在发布链接时做类似于Facebook的事情,或者像缩短可以获得页面标题及其内容的链接服务。
Example: 例:
My idea is to get only the plain text from a web page, for example if the url is an article of a newspaper how can I get only the news's text, like showed in the image. 我的想法是只从网页上获取纯文本,例如,如果网址是报纸的文章,我怎么才能得到新闻的文字,如图中所示。 For now I have been trying to use the HtmlAgilityPack but I can never get the text clean.
现在我一直在尝试使用HtmlAgilityPack,但我永远无法将文本清理干净。
Note this app is for Windows Phone 7. 请注意,此应用程序适用于Windows Phone 7。
You're on the right track with HtmlAgilityPack
. 你正在使用
HtmlAgilityPack
走上正轨。
If you want all the text of the website, go for the innerText
attribute. 如果您想要网站的所有文本,请转到
innerText
属性。 But I suggest you go with the meta description
tag (if available). 但我建议你使用
meta description
标签(如果有的话)。
EDIT - Go for the meta description
. 编辑 - 转到
meta description
。 I believe that's what Facebook is doing: 我相信Facebook正在做的事情:
Facebook link sample Facebook链接样本
Site source 网站来源
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.