简体   繁体   中英

how to convert html to plain text c#?

i am trying to get plain text from html website but i am getting html code instead of plain text.for example < b > hello < /b> < p > its me < / p> How can i convert it to hello its me . Any help is very much appreciated! here is my code .

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
 using System.Net;
 using System.Text.RegularExpressions;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

 namespace WindowsFormsApplication2
 {
   public partial class Form1 : Form
   {
    public Form1()
    {
        InitializeComponent();
    }

    private void button1_Click(object sender, EventArgs e)
    {

        HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(""https://www.dailyfx.com/real-time-news");
        myRequest.Method = "GET";
        WebResponse myResponse = myRequest.GetResponse();
        StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
        string result = sr.ReadToEnd();




        textBox1.Text = result;
        sr.Close();
        myResponse.Close();
    }
    }
}
 You can use regex expressions for this. 

 Regex.Replace(htmltext, "<.*?>", string.Empty);

 Eg:- String htmltext = "string html = "<p>Test1 <b>.NET</b> Test2 Test3 
                         <i>HTML</i> Test4.</p>";"
      Output will be :- Test1 Test2 Test3 Test4.

This will help to you. http://www.codeproject.com/Tips/136704/Remove-all-the-HTML-tags-and-display-a-plain-text

Short answer: No direct conversion; you're "screen-scraping" a website; parse the result string to extract what you need (or better yet, see if there is an API provided by the website in question).

Websites render in HTML, not plain text. Although you're getting the result back as a string, you'll need to parse it to extract the text you are interested in. The actual extraction highly depends on what you are trying to accomplish. If the website is proper XHTML, you can load it into an XDocument as XML and traverse the tree to get the information you need; otherwise, the HTMLAgilityPack suggested in one of the comments may be of help (not as magical as the comment is alluding to - it's a bit more work than GetString ...)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM