简体   繁体   English

在空白处分割字符串,但在HTML标记内排除

[英]Split string on whitespace but exclude inside HTML tag

I have this decoded html string <div style=\\"text-align:right; \\">test1 <strong>test2 </strong>test3 test4 test5</div> 我有这个解码的html字符串<div style=\\"text-align:right; \\">test1 <strong>test2 </strong>test3 test4 test5</div>

I need to split them in whitespaces but should not split spaces in html tags, so I will have the following split in 5. I'm not a regex guy so I need help. 我需要将它们拆分为空格,但不应拆分html标记中的空格,因此我将在5中进行以下拆分。我不是正则表达式人,所以我需要帮助。

<div style=\"text-align:right;\">test1

<strong>test2

</strong>test3

test4

test5</div>

EDIT: I included </strong> and add another line to make another point. 编辑:我加入了</strong>并添加了另一行以提出另一点。

You can split based on ' <' or '> ' : 您可以基于' <''> '进行拆分:

string value = <div style=\"text - align:right; \">test1 <strong>test2 </strong>test3</div>;
string[] listHtml = Regex.Split(value, "( <)|(> )");

This turned out kinda ugly but works, probably better way than this (may just use htmlagility pack): 原来有点丑陋,但可行,可能比这更好的方法(可以只使用htmlagility pack):

        List<String> finalList = new List<string>();
        bool insideHtml = false;
        StringBuilder sb = new StringBuilder();
        string[] test = "<div style=\"text - align:right; \">test1 <strong>test2 </div>".Split(' ');

        foreach (string t in test)
        {
            if (t.Contains("<"))
            {
                sb.Append(" " + t);
                insideHtml = true;
                if (t.Contains(">"))
                {
                    finalList.Add(sb.ToString());
                    sb.Clear();
                    insideHtml = false;
                }
            }
            else if (t.Contains(">"))
            {
                sb.Append(" " + t);
                finalList.Add(sb.ToString());
                sb.Clear();
                insideHtml = false;
            }
            else
            {
                if (insideHtml)
                {
                    sb.Append(" " + t);
                }
                else
                {
                    finalList.Add(t);
                }
            }
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM