简体   繁体   English

将unicode添加到字符串html标记模式

[英]Add unicode to a string html tag pattern

I am using the below C# script to remove HTML tags from a description column when running in SSIS. 我在SSIS中运行时使用下面的C#脚本从描述列中删除HTML标记。 I have tried to add the following unicode &#58 to the string htmlTagPattern below, but I can not get it to work. 我试图在下面的字符串htmlTagPattern中添加以下unicode &#58 ,但我无法让它工作。

Any assistance is appreciated. 任何帮助表示赞赏。

public class ScriptMain : UserComponent
{
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {    
         Row.Message = RemoveHtml(Row.Message);
    }
   public String RemoveHtml(String message)
   {
       String htmlTagPattern = "<(.|\n)+?>";
        Regex objRegExp = new Regex(htmlTagPattern);
        message = objRegExp.Replace(message, String.Empty);
        return message;
    }
}

There are many methods to convert HTML to plain text: 有很多方法可以将HTML转换为纯文本:

Using HTMLAgilityPack Library 使用HTMLAgilityPack库

You can get the code from the Samples provided: 您可以从提供的样本中获取代码:

You can download HTMLAgilitypack from the following Links: 您可以从以下链接下载HTMLAgilitypack:

Using System.Net 使用System.Net

If you are using .Net framework 4 or highr, you can benefits from the System.Net library which contains method to get the plain text from HTML: 如果您使用.Net framework 4或更高版本,您可以从System.Net库中受益,该库包含从HTML获取纯文本的方法:

System.Net.HttpUtility.HtmlDecode(Row.Column)

Reference: 参考:

Using Regular expressions 使用正则表达式

You can follow one of these links for more details: 您可以关注其中一个链接以获取更多详细信息:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM