简体   繁体   English

两个第n个位置字符之间的正则表达式

[英]Regex between two nth position characters

I'm trying to fetch some data depending from a text string that lies between two characters (_) but could be a word in a nth position. 我正在尝试根据位于两个字符(_)之间的文本字符串来获取一些数据,但可能是第n个位置的单词。

Currently I have the following 目前我有以下内容

!((?:.*?(_)){2})_(.+?)$

working on the following data 处理以下数据

D20_Mbps_U10_Mbps_TC4_P

where I would expect to get 我希望得到的地方

U10

but get nothing as the first part captures 但是第一部分捕获时却什么也没得到

D20_Mbps_

and thus leaves nothing for the second part to capture 因此,第二部分没有留下任何东西

I've tried 我试过了

_\s*(.*?)(?=\s*_)

But this only gives me the first occurance where I need it to be nth position. 但这只是让我第一次出现在我需要它成为第n位置的地方。 Where I can supply n at runtime. 我可以在运行时提供n。

any ideas? 有任何想法吗?

Thanks 谢谢

Let me try answering this in detail. 让我试着详细回答这个问题。

When you want to match some Nth occurrence of a substring within a delimited string, you should really think of some String.Split function. 如果要在分隔字符串中匹配某个子串的第N次出现,您应该考虑一些String.Split函数。 In your case, splitting with _ and getting the values you need is a trivial task. 在您的情况下,使用_分割并获取所需的值是一项微不足道的任务。

Now, when you cannot use a programming means to extract that value, you can only do this with a limiting quantifier , grouping and capturing (in Java and .NET, it is possible to achieve the same even without capturing). 现在,当你不能使用编程方法来提取该值时,你只能通过限制量词 ,分组和捕获来实现这一点(在Java和.NET中,即使没有捕获也可以实现相同的目标)。

So, the main idea is to match 0 or more characters other than your delimiter and then match the delimiters itself, and then repeat the same N-1 times. 因此,主要思想是匹配分隔符以外的0个或更多字符,然后匹配分隔符本身,然后重复相同的 N-1次。 Then, just match the delimiter again and capture following non-delimiter characters. 然后,再次匹配分隔符并捕获以下非分隔符字符。

^(?:[^_]*_){2}([^_]*)

See demo . 演示 Group 1 will contain U10 . 第1组将包含U10

Or another variation : 另一种变化

^(?:[^_]*_){2}([^_]*)_(.+)$

This will capture the 3rd _ -delimited element into Group 1. Group 2 in this case is the 4th+ elements, the rest of the string up to the end. 这将捕获第三_ -delimited元件进入第1组第2组在此情况下是4 +元件,该字符串的剩余部分到最后。

Note that in some regex flavors { and ( must be escaped (vim, sed with non-EGREP versions, etc.). 请注意,在一些正则表达式中{(必须被转义(vim,使用非EGREP版本等)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM