简体   繁体   中英

What's the easiest way to change the contents of text in a string with C#?

I have HTML in a string that looks like this:

<div id="control">
    <a href="/xx/x">y</a>
    <ul>
        <li><a href="/C003Q/x" class="dw">x</a></li>
        <li><a href="/C003R/xx" class="dw">xx</a></li>
        <li><a href="/C003S/xxx" class="dw">xxx</a></li>
    </ul>
</div>

I would like to change this to the following:

<div id="control">
    <a data-href="/xx/x" ><span>y</span></a>
    <ul>
        <li><a data-href="/C003Q/x" class="dw"><span>x</span></a></li>
        <li><a data-href="/C003R/xx" class="dw"><span>xx</span></a></li>
        <li><a data-href="/C003S/xxx" class="dw"><span>xxx</span></a></li>
    </ul>
</div>

I heard about regex but I am not sure how I can use it to change something inside the address tags and to change href at the same time. Would I need to use regex twice and can I change the inside of the <a ... >...</a> using regex or is there an easier way with C#?

Regex is, in general, not suitable for parsing HTML , the exception being well known and well structured HTML (ie. you know exactly what you are trying to parse).

There are HTML parsers that you can use - the HTML Agility Pack is a popular option, and there also CsQuery .


What is exactly the Html Agility Pack (HAP)?

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).


CsQuery - .C# jQuery Port for .NET 4

CsQuery is a jQuery port for .NET 4. It implements all CSS2 & CSS3 selectors, all the DOM manipulation methods of jQuery, and some of the utility methods. The majority of the jQuery test suite (as of 1.6.2) has been ported to C#.

You can use a regular expression replace. Use parentheses to catch values in the text that you match, and use $1 , $2 et.c. to use the values in the replacement string:

str = Regex.Replace(
  str,
  "<a href=\"(.+?)\" class=\"dw\">(.+?)</a>",
  "<a data-href=\"$1\" class=\"dw\"><span>$2</span></a>"
);

Note: If the HTML code doesn't have that exact same form, the replace won't work. If there for example is another attribute in the anchor tag, or if the attribue order is reversed, the pattern won't match.

If you don't want to use a Regex , you could do:

string newString = oldString.Replace("dw\">", "dw\"><span>")
                            .Replace("</a", "</span></a");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM