從字符串中刪除xml和html

Question

我有一個字符串，需要從其中刪除所有HTML和XML。 我對正則表達式不太滿意。 對於HTML，我發現了一些非常有用的代碼：

snippet = Regex.Replace(snippet, "<.*?>", "");

目前，我正在針對XML執行此操作：

while (snippet.IndexOf("<xml>") != -1)
            {
                int startLoc = snippet.IndexOf("<xml>");
                int endLoc = snippet.IndexOf("</xml>");
                snippet = snippet.Remove(startLoc, (endLoc - startLoc) + 6);
            }
            while (snippet.IndexOf("<style>") != -1)
            {
                int startLoc = snippet.IndexOf("<style>");
                int endLoc = snippet.IndexOf("</style>");
                snippet = snippet.Remove(startLoc, (endLoc - startLoc) + 8);
            }
            // only required for chrome and IE
            // removes - <object  classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D" id="ieooui">
            while (snippet.IndexOf("<object") != -1)
            {
                int startLoc = snippet.IndexOf("<object");
                int endLoc = snippet.IndexOf("id=\"ieooui\">");
                snippet = snippet.Remove(startLoc, (endLoc - startLoc) + 12);
            }
            // removes - <object id="ieooui" classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D">
            while (snippet.IndexOf("<object") != -1)
            {
                int startLoc = snippet.IndexOf("<object");
                int endLoc = snippet.IndexOf("classid=\"clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D\"");
                snippet = snippet.Remove(startLoc, (endLoc - startLoc) + 52);
            }

這很不整潔。 有人可以請我也為xml建議一個正則表達式，尤其是：

<object id="ieooui" classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D">

和

<object  classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D" id="ieooui">

萬分感謝。

Answer 1

通常，您無法通過regexp解析HTML。 好吧，從技術上講您可以，但是正如您所說的那樣，這是“愚蠢的”。 該任務通常是通過使用SAX解析器來完成的。 甚至沒有使用HTML / XML標記生成器。 像這樣的http://www.codeproject.com/KB/recipes/HTML_XML_Scanner.aspx

從字符串中刪除xml和html

問題描述

1 個解決方案

解決方案1
0 2011-04-20 06:14:47

從字符串中刪除xml和html

問題描述

1 個解決方案

解決方案1 0 2011-04-20 06:14:47

解決方案1
0 2011-04-20 06:14:47