简体   繁体   中英

Replacing a string in C# using something like a regex

I have an HTML file that I need to relace some things in.

The structure of the file that needs to be replaced is:

<td>xxxx!!</td>

and replaced with:

<td align="center">xxxx!!</td>

The text between the td's is as:

xxxx is an letter, number, period or space
!! are two exclamation points

How do you replace these in C# .net?

You should not try to parse HTML with regex, use an HTML Parser instead. For C# you can use http://htmlagilitypack.codeplex.com/

First you need to add Html Agility Pack:

Install-Package HtmlAgilityPack

You don't provide any example because of that I build mine.

   using HtmlAgilityPack;//use this namespace

   static void Main(string[] args)
    {
        string html = @"<!DOCTYPE html>
<html>
<body>

<h1>My First Heading</h1>

<p>My first paragraph.</p>

<table>
    <tr>
        <td>A!!</td>
        <td>te2</td>
        <td>2!!</td>
        <td>te43</td>
        <td></td>
        <td> !!</td>
        <td>.!!</td>
        <td>te53</td>
        <td>te2</td>
        <td>texx</td>
    </tr>
</table>

</body>
</html>";

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(html);

        List<HtmlNode> tdNodes = doc.DocumentNode.Descendants().Where(x => x.Name == "td").ToList();

        foreach(HtmlNode node in tdNodes)
        {
            if (!node.InnerText.Contains("!!"))
                continue;

            node.Attributes.Add("align", "center");
        }

        string html2 = doc.DocumentNode.InnerHtml;
    }

If there is possibility of having another td with !! in them build Regular expression for your case which should look for .,number,empty space, letter and only in this case add the attribute.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM