[英]Split string on whitespace but exclude inside HTML tag
I have this decoded html string <div style=\\"text-align:right; \\">test1 <strong>test2 </strong>test3 test4 test5</div>
我有这个解码的html字符串<div style=\\"text-align:right; \\">test1 <strong>test2 </strong>test3 test4 test5</div>
I need to split them in whitespaces but should not split spaces in html tags, so I will have the following split in 5. I'm not a regex guy so I need help. 我需要将它们拆分为空格,但不应拆分html标记中的空格,因此我将在5中进行以下拆分。我不是正则表达式人,所以我需要帮助。
<div style=\"text-align:right;\">test1
<strong>test2
</strong>test3
test4
test5</div>
EDIT: I included </strong>
and add another line to make another point. 编辑:我加入了</strong>
并添加了另一行以提出另一点。
You can split based on ' <'
or '> '
: 您可以基于' <'
或'> '
进行拆分:
string value = <div style=\"text - align:right; \">test1 <strong>test2 </strong>test3</div>;
string[] listHtml = Regex.Split(value, "( <)|(> )");
This turned out kinda ugly but works, probably better way than this (may just use htmlagility pack): 原来有点丑陋,但可行,可能比这更好的方法(可以只使用htmlagility pack):
List<String> finalList = new List<string>();
bool insideHtml = false;
StringBuilder sb = new StringBuilder();
string[] test = "<div style=\"text - align:right; \">test1 <strong>test2 </div>".Split(' ');
foreach (string t in test)
{
if (t.Contains("<"))
{
sb.Append(" " + t);
insideHtml = true;
if (t.Contains(">"))
{
finalList.Add(sb.ToString());
sb.Clear();
insideHtml = false;
}
}
else if (t.Contains(">"))
{
sb.Append(" " + t);
finalList.Add(sb.ToString());
sb.Clear();
insideHtml = false;
}
else
{
if (insideHtml)
{
sb.Append(" " + t);
}
else
{
finalList.Add(t);
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.