简体繁体 English

C＃从HTML文档中提取内容

[英]C# extract content from HTML document

原文 2012-06-24 14:56:18 3 1 c#/ html/ windows-phone-7

I was wondering how can I do something similar to Facebook when a link is posted or like shortening link services that can get the title of the page and its content. 我想知道如何在发布链接时做类似于Facebook的事情，或者像缩短可以获得页面标题及其内容的链接服务。

Example: 例：

My idea is to get only the plain text from a web page, for example if the url is an article of a newspaper how can I get only the news's text, like showed in the image. 我的想法是只从网页上获取纯文本，例如，如果网址是报纸的文章，我怎么才能得到新闻的文字，如图中所示。 For now I have been trying to use the HtmlAgilityPack but I can never get the text clean. 现在我一直在尝试使用HtmlAgilityPack，但我永远无法将文本清理干净。

Note this app is for Windows Phone 7. 请注意，此应用程序适用于Windows Phone 7。

1 个解决方案

You're on the right track with HtmlAgilityPack . 你正在使用HtmlAgilityPack走上正轨。

If you want all the text of the website, go for the innerText attribute. 如果您想要网站的所有文本，请转到innerText属性。 But I suggest you go with the meta description tag (if available). 但我建议你使用meta description标签（如果有的话）。

EDIT - Go for the meta description . 编辑 - 转到meta description 。 I believe that's what Facebook is doing: 我相信Facebook正在做的事情：

Facebook link sample Facebook链接样本

Facebook链接样本

Site source 网站来源

网站来源

使用C＃从HTML表中提取特定内容-HtmlAgilityPack - Extract specific content from HTML table with C# - HtmlAgilityPack

在C＃中使用html内容创建Word文档 - create word document with html content in c#

C＃从.XPS文档中提取文本 - C# Extract Text from .XPS Document

从C＃文档中读取内容 - reading content from document in c#

如何从C＃中提取Google文档的HTML内容？ - How do I extract the HTML content of a Google Doc from C#?

使用C＃从HTML页面中提取一些内容及其相应的Xpath - Using C# to extract some content and its correponding Xpath from an HTML page

使用C＃从html标记中提取文本 - Extract texts from a html tag with C#

C# - 从 JSON 中提取 HTML - C# - extract HTML from JSON

C＃从MHT文件提取HTML - C# extract HTML from MHT file

使用 c# 在 word 文档中查找字符串并将其替换为 HTML 内容 - Find string and replace it with HTML content in word document using c#

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用C＃从HTML表中提取特定内容-HtmlAgilityPack - Extract specific content from HTML table with C# - HtmlAgilityPack 在C＃中使用html内容创建Word文档 - create word document with html content in c# C＃从.XPS文档中提取文本 - C# Extract Text from .XPS Document 从C＃文档中读取内容 - reading content from document in c# 如何从C＃中提取Google文档的HTML内容？ - How do I extract the HTML content of a Google Doc from C#? 使用C＃从HTML页面中提取一些内容及其相应的Xpath - Using C# to extract some content and its correponding Xpath from an HTML page 使用C＃从html标记中提取文本 - Extract texts from a html tag with C# C# - 从 JSON 中提取 HTML - C# - extract HTML from JSON C＃从MHT文件提取HTML - C# extract HTML from MHT file 使用 c# 在 word 文档中查找字符串并将其替换为 HTML 内容 - Find string and replace it with HTML content in word document using c#

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM