简体   繁体   English

解析字符串并将其与正则表达式分组?

[英]Parsing string and grouping them with Regex?

I have no idea how to use regex, but my friend told me it would be the most efficient way for what I'm trying to achieve. 我不知道如何使用正则表达式,但是我的朋友告诉我这将是我要实现的最有效的方法。 I've asked multiple people to help, but all the code they gave me was undocumented, which isn't helpful at all. 我已经请了多个人来帮助,但是他们给我的所有代码都没有记载,这根本没有帮助。 This project is for me to learn - and I thought this would be the best place for this. 这个项目供我学习-我认为这将是最好的地方。 Anyways - I'm trying to group everything inside tags. 无论如何-我正在尝试将所有内容分组。

Here's an example code: 这是一个示例代码:

<tr>
<td width=0%>One:</td><td width=23% class='colour'>Text</a></td>
<td width=0%>Two:</td><td width=23% class='colour'><div class='full' Style='width:140px'><div class='active' style='width:70px'></div></div></td>
<td width=0%>Three:</td><td class="colour"><div class='full' style='width:140px'><div class='active' style='width:70px'></div></div></td>
</tr>
<tr>
<td width=0%>Seven:</td><td class="colour">Text</a></td>
<td width=0%>Eight:</td><td class="colour"><div class='full' style='width:140px'><div class='active' style='width:84px'></div></div></td>
<td width=0%><strong>Twenty</strong>:</td><td width=23% class='colour'><div class='ful' style='width:140px'><div class='active' style='width:80.3345222473px'></div></div> (5.74)</td>
</tr>

How would I parse all of that so it gets grouped like this? 我将如何解析所有这些内容,以便将其进行分组? (I'm using string[] just as example) (我仅以string []为例)

string[] tr1 = new string[]{
One: Text
Two: 140/70
Three: 140/70
}

string[] tr2 = new string[]{
Seven: Text
Eight: 140/84
Twenty: 140/80.3345222473
}

The divisions are basically the "full style" / "active style". 划分基本上是“完整样式” /“活动样式”。

Is this possible using Regex or any other way at all? 是否可以使用正则表达式或其他任何方式?

I'm sorry I can't show what I've already done since I really haven't done anything at all that's relevant.. I've tried to learn Regex patterns since my friend told me Regex was the most efficient, but I failed so miserably at it... Sigh. 很抱歉,由于我真的没有做过任何与之相关的事情,所以我无法展示自己已经做过的事情。.我已经尝试学习Regex模式,因为我的朋友告诉我Regex是最有效的,但是我失败了,真是不幸。

This would mean a whole bunch if someone can guide me through this! 如果有人可以指导我完成这一切,那将意味着一大堆!

Thanks! 谢谢!

If you use the following Regex pattern, it will help you to extract "One:" and "Text" and of course you can then concatenate the way you like. 如果使用下面的Regex模式,它将帮助您提取“一个:”和“文本”,当然,然后您可以连接自己喜欢的方式。

<td width=0%>(.+)</td><td[^>]+>([^<]+).*</td>

How it works: 这个怎么运作:

  1. First we need to find the pre-condition, which must match but we do not want to capture . 首先,我们需要找到必须匹配的前提条件,但我们不想捕获它 In the above, that is <td width=0%> . 以上是<td width=0%>
  2. Secondly, we want to capture the "One:" which is fulfilled by the (.+) which means give me anything but at least 1 character. 其次,我们要捕获由(.+)满足的“一个:”,这意味着给我除至少一个字符外的任何字符。 When does it know when to stop is that I have </td> right after which has higher precedence of matching over the brackets. 它何时知道何时停止是</td>紧随其后的是比括号更高的匹配优先级。
  3. Then followed by a new condition that must be matched but is not captured. 然后是必须匹配但未被捕获的新条件。 Refer to point 1 to get the idea. 请参考第1点以了解想法。
  4. The next thing you wanted was extracting "Text" out which can be fulfilled by ([^<]+) which means give me any character(s) until it hits the left arrow < . 您想要的下一件事情是提取“文本”,该文本可以由([^<]+) ,这意味着给我任何字符,直到它击中左箭头<为止。
  5. Followed by a condition that says 0 or more characters until we come across </td> . 接下来是一个条件,该条件是说0个或更多字符,直到我们遇到</td>为止。

With the above, you can also use similar ways to capture "Two:" and "Three:". 通过上面的内容,您还可以使用类似的方式来捕获“ Two:”和“ Three:”。

To aid you in retrieving 140, you need to begin hunting for conditions to match. 为了帮助您检索140,您需要开始寻找要匹配的条件。 Based on your HTML, I see " style=... " and they are all within <div class=full . 根据您的HTML,我看到“ style=... ”,它们都在<div class=full Therefore, to extract 140 or 70, or 123.45, you may use: 因此,要提取140或70或123.45,可以使用:

<div class='full' style='width:([0-9.]+)px[^>]+>

Explanation: 说明:
As before, you need the pre-condition. 和以前一样,您需要先决条件。 The capture is then ([0-9.]+) which means at least 1 digit or dot, immediately followed by px which must match and so on. 然后捕获为([0-9.]+) ,表示至少1个数字或点,紧随其后的是px (必须匹配),依此类推。

There are many ways to achieve your requirement. 有很多方法可以满足您的要求。 These are not the best regex patterns for your needs but they will suffice. 这些不是满足您需求的最佳正则表达式模式,但足够了。

UPDATE: Please use "Ignore Case" in regex options since I saw a mix of lower and upper cases. 更新:由于我看到大小写混合,请在正则表达式选项中使用“忽略大小写”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM