简体   繁体   中英

Parsing string and grouping them with Regex?

I have no idea how to use regex, but my friend told me it would be the most efficient way for what I'm trying to achieve. I've asked multiple people to help, but all the code they gave me was undocumented, which isn't helpful at all. This project is for me to learn - and I thought this would be the best place for this. Anyways - I'm trying to group everything inside tags.

Here's an example code:

<tr>
<td width=0%>One:</td><td width=23% class='colour'>Text</a></td>
<td width=0%>Two:</td><td width=23% class='colour'><div class='full' Style='width:140px'><div class='active' style='width:70px'></div></div></td>
<td width=0%>Three:</td><td class="colour"><div class='full' style='width:140px'><div class='active' style='width:70px'></div></div></td>
</tr>
<tr>
<td width=0%>Seven:</td><td class="colour">Text</a></td>
<td width=0%>Eight:</td><td class="colour"><div class='full' style='width:140px'><div class='active' style='width:84px'></div></div></td>
<td width=0%><strong>Twenty</strong>:</td><td width=23% class='colour'><div class='ful' style='width:140px'><div class='active' style='width:80.3345222473px'></div></div> (5.74)</td>
</tr>

How would I parse all of that so it gets grouped like this? (I'm using string[] just as example)

string[] tr1 = new string[]{
One: Text
Two: 140/70
Three: 140/70
}

string[] tr2 = new string[]{
Seven: Text
Eight: 140/84
Twenty: 140/80.3345222473
}

The divisions are basically the "full style" / "active style".

Is this possible using Regex or any other way at all?

I'm sorry I can't show what I've already done since I really haven't done anything at all that's relevant.. I've tried to learn Regex patterns since my friend told me Regex was the most efficient, but I failed so miserably at it... Sigh.

This would mean a whole bunch if someone can guide me through this!

Thanks!

If you use the following Regex pattern, it will help you to extract "One:" and "Text" and of course you can then concatenate the way you like.

<td width=0%>(.+)</td><td[^>]+>([^<]+).*</td>

How it works:

  1. First we need to find the pre-condition, which must match but we do not want to capture . In the above, that is <td width=0%> .
  2. Secondly, we want to capture the "One:" which is fulfilled by the (.+) which means give me anything but at least 1 character. When does it know when to stop is that I have </td> right after which has higher precedence of matching over the brackets.
  3. Then followed by a new condition that must be matched but is not captured. Refer to point 1 to get the idea.
  4. The next thing you wanted was extracting "Text" out which can be fulfilled by ([^<]+) which means give me any character(s) until it hits the left arrow < .
  5. Followed by a condition that says 0 or more characters until we come across </td> .

With the above, you can also use similar ways to capture "Two:" and "Three:".

To aid you in retrieving 140, you need to begin hunting for conditions to match. Based on your HTML, I see " style=... " and they are all within <div class=full . Therefore, to extract 140 or 70, or 123.45, you may use:

<div class='full' style='width:([0-9.]+)px[^>]+>

Explanation:
As before, you need the pre-condition. The capture is then ([0-9.]+) which means at least 1 digit or dot, immediately followed by px which must match and so on.

There are many ways to achieve your requirement. These are not the best regex patterns for your needs but they will suffice.

UPDATE: Please use "Ignore Case" in regex options since I saw a mix of lower and upper cases.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM