简体   繁体   English

使用HtmlAgility包转换HTML

[英]Transform HTML using HtmlAgility pack

I have on google all the possible ways to convert the HTML to a different type of HTML(i guess its HTML 5). 我有谷歌所有可能的方法将HTML转换为不同类型的HTML(我猜它的HTML 5)。 I din get any lucky. 我很幸运。 I am trying to transform this( markup of RTF editor converted to HTML ) markup 我试图将此转换( markup of RTF editor converted to HTML标记markup of RTF editor converted to HTML )标记

<DIV STYLE="text-align:Left;font-family:Segoe UI;font-style:normal;font-weight:normal;font-size:12;color:#000000;">
    <UL STYLE="margin:0 0 0 0;padding:0 0 0 0;">
        <LI STYLE="margin:0 0 0 24;">
            <P STYLE="font-family:Microsoft Sans Serif;font-weight:bold;font-size:11.333333333333332;margin:0 0 0 0;">
                <SPAN>
                    <SPAN>open paint</SPAN>
                </SPAN>
            </P>
        </LI>
        <LI STYLE="margin:0 0 0 24;">
            <P STYLE="font-family:Microsoft Sans Serif;font-weight:bold;font-size:11.333333333333332;margin:0 0 0 0;">
                <SPAN>
                    <SPAN>open calc</SPAN>
                </SPAN>
            </P>
        </LI>
    </UL>
</DIV>

to(nicEditor markup) 到(nicEditor标记)

<UL>
    <LI>
        <STRONG>open paint</STRONG>

    </LI>
    <LI>
        <STRONG>open calc</STRONG>
    </LI>
</UL>

using HtmlAgilityPack . 使用HtmlAgilityPack I am trying to traverse through the html markup and manually replace with the 2nd markup that I want. 我试图遍历html标记并手动替换我想要的第二个标记。 Its has number of problems. 它有很多问题。 I am not able to convert the opening and closing tags properly and apply css like formatting. 我无法正确转换开始和结束标签并应用css之类的格式。 I am using nicEditor from rtf editor. 我正在使用rtf编辑器中的nicEditor

Following is my c# code which I am trying to use to manually convert it. 以下是我的c#代码,我试图用它来手动转换它。

private string transformHTML(string strTransform)
        {
            string final = "";
            if (WebUtility.HtmlDecode(strTransform).StartsWith("<DIV") || WebUtility.HtmlDecode(strTransform).StartsWith("<HTML"))
            {
                HtmlAgilityPack.HtmlDocument resultat = new HtmlAgilityPack.HtmlDocument();
                string source = WebUtility.HtmlDecode(strTransform);
                resultat.LoadHtml(source);
                string o = resultat.DocumentNode.OuterHtml;


                List<string> startStringList = new List<string>();
                List<string> lastStringList = new List<string>();
                List<string> innerTextList = new List<string>();
                List<string> newLine = new List<string>();
                StringBuilder sb = new StringBuilder();
                string innterText = "";
                string child = "";



                foreach (HtmlNode node in resultat.DocumentNode.Descendants())
                {

                    switch (node.Name.ToLower())
                    {
                        case "ul":
                            startStringList.Add("<UL>");
                            lastStringList.Add("</UL>");
                            break;

                        case "li":
                            startStringList.Add("<LI>");
                            lastStringList.Add("</LI>");
                            break;

                        case "span":
                            if (!innerTextList.Contains(node.InnerText.Trim()))
                                innerTextList.Add(node.InnerText.Trim());// = node.InnerText;
                            foreach (var item in node.Attributes)
                            {
                                string values = item.Value;
                                values = values.ToLower();
                                if (values.Contains("FONT-WEIGHT:".ToLower()))
                                {
                                    string wt = values.Split(new string[] { "FONT-WEIGHT:".ToLower() }, StringSplitOptions.None)[1].ToString();
                                    if (wt.Trim().Split(';')[0].ToLower() == "bold")
                                    {
                                        startStringList.Add("<STRONG>");
                                        lastStringList.Add("</STRONG>");
                                    }
                                }
                                if (values.Contains("FONT-STYLE:".ToLower()))
                                {
                                    string wt = values.Split(new string[] { "FONT-STYLE:".ToLower() }, StringSplitOptions.None)[1].ToString();
                                    if (wt.Trim().Split(';')[0].ToLower() == "italic")
                                    {
                                        startStringList.Add("<I>");
                                        lastStringList.Add("</I>");
                                    }
                                }

                                if (values.Contains("TEXT-DECORATION:".ToLower()))
                                {
                                    string wt = values.Split(new string[] { "TEXT-DECORATION:".ToLower() }, StringSplitOptions.None)[1].ToString();
                                    if (wt.Trim().Split(';')[0].ToLower() == "underline")
                                    {
                                        startStringList.Add("<U>");
                                        lastStringList.Add("</U>");
                                    }
                                }
                            }
                            break;
                        case "p":
                            foreach (var item in node.Attributes)
                            {
                                string values = item.Value;
                                values = values.ToLower();
                                if (values.Contains("text-align:".ToLower()))
                                {
                                    string wt = values.Split(new string[] { "text-align:".ToLower() }, StringSplitOptions.None)[1].ToString();
                                    if (wt.Trim().Split(';')[0].ToLower() == "Center".ToLower())
                                    {
                                        startStringList.Add("<P align=center>");
                                        lastStringList.Add("</P>");
                                    }

                                    if (wt.Trim().Split(';')[0].ToLower() == "Right".ToLower())
                                    {
                                        startStringList.Add("<P align=right>");
                                        lastStringList.Add("</P>");
                                    }

                                    if (wt.Trim().Split(';')[0].ToLower() == "justify".ToLower())
                                    {
                                        startStringList.Add("<P align=justify>");
                                        lastStringList.Add("</P>");
                                    }
                                    if (wt.Trim().Split(';')[0].ToLower() == "left".ToLower())
                                    {
                                        startStringList.Add("<P align=left>");
                                        lastStringList.Add("</P>");
                                    }
                                }
                                if (values.Contains("FONT-WEIGHT:".ToLower()))
                                {
                                    string wt = values.Split(new string[] { "FONT-WEIGHT:".ToLower() }, StringSplitOptions.None)[1].ToString();
                                    if (wt.Trim().Split(';')[0].ToLower() == "bold")
                                    {
                                        startStringList.Add("<STRONG>");
                                        lastStringList.Add("</STRONG>");
                                    }
                                }
                                if (values.Contains("FONT-STYLE:".ToLower()))
                                {
                                    string wt = values.Split(new string[] { "FONT-STYLE:".ToLower() }, StringSplitOptions.None)[1].ToString();
                                    if (wt.Trim().Split(';')[0].ToLower() == "italic")
                                    {
                                        startStringList.Add("<I>");
                                        lastStringList.Add("</I>");
                                    }
                                }

                                if (values.Contains("TEXT-DECORATION:".ToLower()))
                                {
                                    string wt = values.Split(new string[] { "TEXT-DECORATION:".ToLower() }, StringSplitOptions.None)[1].ToString();
                                    if (wt.Trim().Split(';')[0].ToLower() == "underline")
                                    {
                                        startStringList.Add("<U>");
                                        lastStringList.Add("</U>");
                                    }
                                }
                            }
                            break;
                    }
                }

                lastStringList.Reverse();
                foreach (var item1 in startStringList)
                {
                    final += item1;

                }
                foreach (var item3 in innerTextList)
                {
                    final += item3 + "<br>";


                }
                final += innterText;
                foreach (var item2 in lastStringList)
                {
                    final += item2;
                }

            }
            return final;
        }

I would consider using XDocument and XElement to do the heavy lifting of this task. 我会考虑使用XDocument和XElement来完成这项任务。

As long as you can control what goes where, you will have a far easier time doing html using an XML structure. 只要你可以控制在哪里,你就可以更轻松地使用XML结构来执行html。 There is an example here: 这里有一个例子:

http://www.dotnetperls.com/xelement http://www.dotnetperls.com/xelement

But if you search around for XDocument and XElement , you'll find tons of documentation on the subject. 但是如果你搜索XDocumentXElement ,你会发现很多关于这个主题的文档。

But for goodness sake, use lowercase :) 但为了善良,请使用小写:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM