将HTML文本转换为纯文本

Question

I have a text area. 我有一个文本区域。 I allow entering html markups in that any html code can be entered. 我允许输入html标记，因为可以输入任何html代码。

now i want to convert that html code to plain text without using third party tool...how can it be done 现在我想在不使用第三方工具的情况下将该HTML代码转换为纯文本...如何才能完成

currently i am doing it like below:- 目前我正在这样做： -

var desc = Convert.ToString(Html.Raw(Convert.ToString(drJob["Description"])));

drJob["Description"] is datarow from where I fetch description and I want to convert description to plain text. drJob [“Description”]是我获取描述的数据行，我想将描述转换为纯文本。

Answer 1

There is no direct way coming from .NET to do this. .NET没有直接的方法可以做到这一点。 You either need to resort to a third party tool like HtmlAgilePack- or do this in javascript. 您需要使用像HtmlAgilePack这样的第三方工具，或者在javascript中执行此操作。

document.getElementById('myTextContainer').innerText = document.getElementById('myMarkupContainer').innerText;

For your safety, dont use a regex. 为了您的安全，请不要使用正则表达式。 ( http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html ) （ http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html ）

Answer 2

您可以使用System.Text.RegularExpressions.Regex空标识符替换为html标记

String desc = Regex.Replace(drJob["Description"].ToString(), @"<[^>]*>", String.Empty);

Answer 3

您可以使用正则表达式“<[^>] +>”简单地使用替换方法

Answer 4

using System.Text.RegularExpressions;

    private void button1_Click(object sender, EventArgs e)
    {
        string sauce = htm.Text; // htm = your html box
        Regex myRegex = new Regex(@"(?<=^|>)[^><]+?(?=<|$)", RegexOptions.Compiled);
        foreach (Match iMatch in myRegex.Matches(sauce))
        {
            txt.AppendText(Environment.NewLine + iMatch.Value); //txt = your destination box
        }

    }

Let me know if you need more clarification. 如果您需要更多说明，请与我们联系。

[EDIT:] Be aware that this is not a clean function, so add a line to clean up empty spaces or line breaks. [编辑：]请注意，这不是一个干净的功能，所以添加一行来清理空格或换行符。 But the actual getting of text from in-between tags should work fine. 但实际从中间标签获取文本应该可以正常工作。 If you want to save space - use regex and see if this works for you. 如果你想节省空间 - 使用正则表达式，看看这是否适合你。 Although the person who posted about regex not being clean is right, there might be other ways; 虽然发布关于正则表达不干净的人是对的，但可能还有其他方法; Regex is usually better when separating a single type of tag from html. 从html中分离单一类型的标记时，正则表达式通常更好。 (I use it for rainmeter to parse stuff and never had any issues) （我用雨量计来解析东西，从来没有任何问题）

将HTML文本转换为纯文本

问题描述

4 个解决方案

解决方案1
2 2012-03-29 07:51:04

解决方案2
1 2012-03-29 07:51:22

解决方案3
0 2012-03-29 07:48:17

解决方案4
0 2012-03-29 08:56:07

将HTML文本转换为纯文本

问题描述

4 个解决方案

解决方案1 2 2012-03-29 07:51:04

解决方案2 1 2012-03-29 07:51:22

解决方案3 0 2012-03-29 07:48:17

解决方案4 0 2012-03-29 08:56:07

解决方案1
2 2012-03-29 07:51:04

解决方案2
1 2012-03-29 07:51:22

解决方案3
0 2012-03-29 07:48:17

解决方案4
0 2012-03-29 08:56:07