从html字符串中拆分段落并删除空段落

Question

我有一个html字符串。 我想将所有段落拆分为一个数组列表。 但是，分割后的段落不应为空。 分割后的段落应包含一些普通文本，如果它仅包含html文本，并且其中没有普通文本，例如： <htmltag>     </htmltag> ，则应将其销毁或不销毁分裂了。

这是一个如何在html字符串内拆分段落的示例：

System.Text.RegularExpressions.Match m = System.Text.RegularExpressions.Regex.Match(htmlString, @"<p>\s*(.+?)\s*</p>");
ArrayList groupCollection = new ArrayList();
while (m.Success)
{
   groupCollection.Add(m.Value);
   m = m.NextMatch();
}
ArrayList paragraphs = new ArrayList();
if (groupCollection.Count > 0)
{
   foreach (object item in groupCollection)
   {
      paragraphs.Add(item);
   }
}

上面的代码可以拆分所有段落，但不能像我上面所说的那样识别哪个段落为空。

Answer 1

我有自己的问题的答案。 这是我自己的版本上的代码：

System.Text.RegularExpressions.Match m = System.Text.RegularExpressions.Regex.Match(htmlString, @"<p>\s*(.+?)\s*</p>");
    ArrayList groupCollection = new ArrayList();
    while (m.Success)
    {
        groupCollection.Add(m.Value);
        m = m.NextMatch();
    }
    ArrayList paragraphs = new ArrayList();
    if (groupCollection.Count > 0)
    {
        foreach (object item in groupCollection)
        {
            try
            {
                System.Text.RegularExpressions.Regex rx = new System.Text.RegularExpressions.Regex("<[^>]*>");
                // replace all matches with empty string
                string str = rx.Replace(item.ToString(), "");
                string str1 = str.Replace("&nbsp;", "");
                if (!String.IsNullOrEmpty(str1))
                {
                    paragraphs.Add(item.ToString());
                }
            }
            catch
            {
                //This try-catch just prevent future error.
            }
        }
    }

在上面的代码上。 您可以看到我首先删除了该段落中的所有html标记，然后替换了html字符串中的所有空白。 这将帮助我确定一个空的段落。

从html字符串中拆分段落并删除空段落

问题描述

1 个解决方案

解决方案1
0 2013-03-19 04:57:42

从html字符串中拆分段落并删除空段落

问题描述

1 个解决方案

解决方案1 0 2013-03-19 04:57:42

解决方案1
0 2013-03-19 04:57:42