简体   繁体   English

标签之间的正则表达式匹配文本

[英]Regular expression match text between tag

I need a help with regular expression as I do not have good knowledge in it. 我需要一个正则表达式方面的帮助,因为我对此不了解。

I have regular expression as: 我有如下正则表达式:

Regex myregex = new Regex("testValue=\"(.+?)\"");

What does (.+?) indicate? (。+?)表示什么?

The string it matches is "testValue=123e4567" and returns 123e4567 as output. 匹配的字符串是"testValue=123e4567"并返回123e4567作为输出。

Now I need help in regular expression to match a string "<helpMe>123e4567</helpMe>" where I need 123e4567 as output. 现在,我需要正则表达式帮助来匹配字符串"<helpMe>123e4567</helpMe>" ,在这里我需要123e4567作为输出。 How do I write a regular expression for it? 如何为它编写正则表达式?

This means: 这意味着:

(   Begin captured group
.   Match any character
+   One or more times
?   Non-greedy quantifier
)   End captured group

In the case of your regex, the non-greedy quantifier ? 对于您的正则表达式,非贪婪量词? means that your captured group will begin after the first double-quote, and then end immediately before the very next double-quote it encounters. 表示您捕获的组将在第一个双引号之后开始,然后在遇到的下一个双引号之前立即结束。 If it were greedy (without the ? ), the group would extend to the very last double-quote it encounters on that line (ie, "greedily" consuming as much of the line as possible). 如果是贪婪的(不带? ),则该组将扩展到该行上遇到的最后一个双引号(即“贪婪地”消耗尽可能多的行)。

For your "helpMe" example, you'd want this regex: 对于您的“ helpMe”示例,您需要此正则表达式:

<helpMe>(.+?)</helpMe>

Given this string: 给定此字符串:

<div>Something<helpMe>ABCDE</helpMe></div>

You'd get this match: 您将获得此比赛:

ABCDE

The value of the non-greedy quantifier is evident in this variation: 在这种变化形式中,非贪婪量词的价值显而易见:

Regex: <helpMe>(.+)</helpMe>
String: <div>Something<helpMe>ABCDE</helpMe><helpMe>FGHIJ</helpMe></div>

The greedy capture would look like this: 贪婪的捕获看起来像这样:

ABCDE</helpMe><helpMe>FGHIJ

There are some useful interactive tools to play with these variations: 有一些有用的交互工具可以处理这些变化:

Ken Redler has a great answer regarding your first question. 肯·雷德勒(Ken Redler)对于您的第一个问题有很好的答案 For the second question try: 对于第二个问题,请尝试:

<(helpMe)>(.*?)</\1>

Using the back reference \\1 you can find values between the set of matching tags. 使用后向引用 \\1您可以在匹配标记集之间找到值。 The first group finds the tag name, the second group matches the content itself, and the \\1 back reference re-uses the first group's match (in this case the tag name). 第一组找到标签名称,第二组找到内容本身,并且\\1反向引用重新使用第一组的匹配(在本例中为标签名称)。

Also, in C# you can use named groups, like: <(helpMe)>(?<value>.*?)</\\1> where now match.Groups["value"].Value contains your value. 另外,在C#中,您可以使用命名组,例如: <(helpMe)>(?<value>.*?)</\\1>现在match.Groups["value"].Value包含您的值。

What does (.+?) indicate? (。+?)表示什么?

It means match any character (.) one or more times (+?) 这意味着匹配任何字符(。)一次或多次(+?)

A simple regex to match your second string would be 一个简单的正则表达式来匹配您的第二个字符串是

<helpMe>([a-z0-9]+)<\/helpMe>

This will match any character of az and any digit inside <helpme> and </helpMe> . 这将匹配az任何字符以及<helpme></helpMe> any digit

The pharanteses are used to capture a group. pharanteses用于捕获组。 This is useful if you need to reference the value inside this group later. 如果以后需要引用该组中的值,这将很有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM