[英]Add unicode to a string html tag pattern
I am using the below C# script to remove HTML tags from a description column when running in SSIS. 我在SSIS中运行时使用下面的C#脚本从描述列中删除HTML标记。 I have tried to add the following unicode : to the string htmlTagPattern below, but I can not get it to work.
我试图在下面的字符串htmlTagPattern中添加以下unicode &#58 ,但我无法让它工作。
Any assistance is appreciated. 任何帮助表示赞赏。
public class ScriptMain : UserComponent
{
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Row.Message = RemoveHtml(Row.Message);
}
public String RemoveHtml(String message)
{
String htmlTagPattern = "<(.|\n)+?>";
Regex objRegExp = new Regex(htmlTagPattern);
message = objRegExp.Replace(message, String.Empty);
return message;
}
}
There are many methods to convert HTML to plain text: 有很多方法可以将HTML转换为纯文本:
You can get the code from the Samples provided: 您可以从提供的样本中获取代码:
You can download HTMLAgilitypack from the following Links: 您可以从以下链接下载HTMLAgilitypack:
If you are using .Net framework 4 or highr, you can benefits from the System.Net
library which contains method to get the plain text from HTML: 如果您使用.Net framework 4或更高版本,您可以从
System.Net
库中受益,该库包含从HTML获取纯文本的方法:
System.Net.HttpUtility.HtmlDecode(Row.Column)
Reference: 参考:
You can follow one of these links for more details: 您可以关注其中一个链接以获取更多详细信息:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.