如何将html转换为纯文本c#？

Question

i am trying to get plain text from html website but i am getting html code instead of plain text.for example hello its me How can i convert it to hello its me .我正在尝试从 html 网站获取纯文本，但我正在获取 html 代码而不是纯文本。例如 你好它的我我如何将它转换为你好它的我。 Any help is very much appreciated!很感谢任何形式的帮助！ here is my code .这是我的代码。

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
 using System.Net;
 using System.Text.RegularExpressions;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

 namespace WindowsFormsApplication2
 {
   public partial class Form1 : Form
   {
    public Form1()
    {
        InitializeComponent();
    }

    private void button1_Click(object sender, EventArgs e)
    {

        HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(""https://www.dailyfx.com/real-time-news");
        myRequest.Method = "GET";
        WebResponse myResponse = myRequest.GetResponse();
        StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
        string result = sr.ReadToEnd();




        textBox1.Text = result;
        sr.Close();
        myResponse.Close();
    }
    }
}

Answer 1

 You can use regex expressions for this. 

 Regex.Replace(htmltext, "<.*?>", string.Empty);

 Eg:- String htmltext = "string html = "<p>Test1 <b>.NET</b> Test2 Test3 
                         <i>HTML</i> Test4.</p>";"
      Output will be :- Test1 Test2 Test3 Test4.

This will help to you.这会对你有所帮助。 http://www.codeproject.com/Tips/136704/Remove-all-the-HTML-tags-and-display-a-plain-text http://www.codeproject.com/Tips/136704/Remove-all-the-HTML-tags-and-display-a-plain-text

Answer 2

Short answer: No direct conversion;简短回答：没有直接转换； you're "screen-scraping" a website;你正在“屏幕抓取”一个网站； parse the result string to extract what you need (or better yet, see if there is an API provided by the website in question).解析结果字符串以提取您需要的内容（或者更好的是，查看相关网站是否提供了 API）。

Websites render in HTML, not plain text.网站以 HTML 呈现，而不是纯文本。 Although you're getting the result back as a string, you'll need to parse it to extract the text you are interested in. The actual extraction highly depends on what you are trying to accomplish.尽管您将结果作为字符串返回，但您需要对其进行解析以提取您感兴趣的文本。实际提取在很大程度上取决于您要完成的任务。 If the website is proper XHTML, you can load it into an XDocument as XML and traverse the tree to get the information you need;如果网站是正确的 XHTML，您可以将其作为 XML 加载到XDocument中并遍历树以获取您需要的信息； otherwise, the HTMLAgilityPack suggested in one of the comments may be of help (not as magical as the comment is alluding to - it's a bit more work than GetString ...)否则，其中一条评论中建议的HTMLAgilityPack可能会有所帮助（不像评论所暗示的那么神奇——它比GetString多一点工作......）

如何将html转换为纯文本c#？

问题描述

2 个解决方案

解决方案1
1 2016-10-13 07:56:58

解决方案2
0 2016-10-13 07:35:30

如何将html转换为纯文本c#？

问题描述

2 个解决方案

解决方案1 1 2016-10-13 07:56:58

解决方案2 0 2016-10-13 07:35:30

解决方案1
1 2016-10-13 07:56:58

解决方案2
0 2016-10-13 07:35:30