简体   繁体   中英

Regex Matching, cascaded tags

Hi I am trying to get results from the tags below, what I need to achieve is to get the first match in the tags, then the fifth match, then the ninth match, so the first and then every fifth match. So my results would be, Note I realize this isnt the best way to parse HTML but I really only need it for this

The regex I am using is

<td class="stat">(.*?)<\/td>

The code I am using is

private static ObservableCollection<Top> top = new ObservableCollection<Top>();

public void twit_topusers_DownloadStringCompleted(Object sender, DownloadStringCompletedEventArgs e)
    {
            string str;
            // Size the control to fill the form with a margin
            str = (string)e.Result;




            Regex r = new Regex("<td class=\"stat\">(.*?)</td>");
            // Find a single match in the string.
            Match m = r.Match(str);





            while (m.Success)
            {

                testMatch = "";

                //
                testMatch += System.Text.RegularExpressions.Regex.Unescape(m.Groups[0].ToString()).Trim();



                top.Add(new Top(testMatch));
                m = m.NextMatch();

            }

            listBox.ItemsSource = top;


    }



    }

The tags are

<td class="stat">14307149</td>//FIRST
<td class="stat">679761</td>
<td class="stat">3508</td>
<td class="stat">62 months ago</td>
<td class="stat">1430700</td>//FIFTH
<td class="stat">679761</td>
<td class="stat">3508</td>
<td class="stat">72 months ago</td>
<td class="stat">1430600</td>//NINTH
<td class="stat">679761</td>
<td class="stat">3508</td>
<td class="stat">82 months ago</td>

But the results I am getting are

Match 1 14307149

Match 2 679761

Match 3 3508

Match 4 62 months ago

Match 5 1430700

Match 6 679761

Match 7 3508

Match 8 72 months ago

Match 9 14307149

Match 10 679761

Match 11 3508

Match 12 62 months ago

The results I need are

Match 1 14307149

Match 2 1430700

Match 3 1430600

Can you help me with this?

It doesn't look like you're checking for the row number at all. If you simply add a counter, then check if its mod of 4 is zero, you'd be good.

counter = 0;
while (m.Success)
{
        if( counter % 4 == 0 )
        {
            testMatch = "";

            //
            testMatch += System.Text.RegularExpressions.Regex.Unescape(m.Groups[0].ToString()).Trim();



            top.Add(new Top(testMatch));
            m = m.NextMatch();

        }
        counter++;
}

Note: I am not a WP7 developer, so this code might be slightly off depending on the way WP7's coding system works.

Change it as follows to match only numbers:

     <td class="stat">(\d+)<\/td>

If I get you correctly you have to first split the string by months ago and then match the results of the split operation by the above regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM