简体   繁体   English

将HTML文本转换为纯文本

[英]Convert HTML text to Plain text

I have a text area. 我有一个文本区域。 I allow entering html markups in that any html code can be entered. 我允许输入html标记,因为可以输入任何html代码。

now i want to convert that html code to plain text without using third party tool...how can it be done 现在我想在不使用第三方工具的情况下将该HTML代码转换为纯文本...如何才能完成

currently i am doing it like below:- 目前我正在这样做: -

var desc = Convert.ToString(Html.Raw(Convert.ToString(drJob["Description"])));

drJob["Description"] is datarow from where I fetch description and I want to convert description to plain text. drJob [“Description”]是我获取描述的数据行,我想将描述转换为纯文本。

There is no direct way coming from .NET to do this. .NET没有直接的方法可以做到这一点。 You either need to resort to a third party tool like HtmlAgilePack- or do this in javascript. 您需要使用像HtmlAgilePack这样的第三方工具,或者在javascript中执行此操作。

document.getElementById('myTextContainer').innerText = document.getElementById('myMarkupContainer').innerText;

For your safety, dont use a regex. 为了您的安全,请不要使用正则表达式。 ( http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html ) http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

您可以使用System.Text.RegularExpressions.Regex空标识符替换为html标记

String desc = Regex.Replace(drJob["Description"].ToString(), @"<[^>]*>", String.Empty);

您可以使用正则表达式“<[^>] +>”简单地使用替换方法

using System.Text.RegularExpressions;

    private void button1_Click(object sender, EventArgs e)
    {
        string sauce = htm.Text; // htm = your html box
        Regex myRegex = new Regex(@"(?<=^|>)[^><]+?(?=<|$)", RegexOptions.Compiled);
        foreach (Match iMatch in myRegex.Matches(sauce))
        {
            txt.AppendText(Environment.NewLine + iMatch.Value); //txt = your destination box
        }

    }

Let me know if you need more clarification. 如果您需要更多说明,请与我们联系。

[EDIT:] Be aware that this is not a clean function, so add a line to clean up empty spaces or line breaks. [编辑:]请注意,这不是一个干净的功能,所以添加一行来清理空格或换行符。 But the actual getting of text from in-between tags should work fine. 但实际从中间标签获取文本应该可以正常工作。 If you want to save space - use regex and see if this works for you. 如果你想节省空间 - 使用正则表达式,看看这是否适合你。 Although the person who posted about regex not being clean is right, there might be other ways; 虽然发布关于正则表达不干净的人是对的,但可能还有其他方法; Regex is usually better when separating a single type of tag from html. 从html中分离单一类型的标记时,正则表达式通常更好。 (I use it for rainmeter to parse stuff and never had any issues) (我用雨量计来解析东西,从来没有任何问题)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM