简体   繁体   English

正则表达式,用于在括号内包含文本的括号之间获取文本

[英]regular expression to get text between brackets that have text between brackets

After trying 10 times to rewrite this question to be accepted , i have a small text that have text between brackets, i want to extract that text so i wrote this expression : 尝试10次重写这个问题后,我有一个小文本,括号之间有文本,我想提取该文本,所以我写了这个表达式:

/(\([^\)]+\))/i

but this only extracts text between first ( and last ) ignoring the rest of text so is there any way to extract full text like : 但是这只会在第一个(和最后一个)之间提取文本而忽略其余的文本,所以有没有办法提取全文,如:

i want(to) extract this text

from : 来自:

this is the text that (i want(to) extract this text) from

there might be more than one bracket enclosed sub-text . 可能有多个括号括起来的子文本。

Thanks 谢谢

EDIT Found this : 编辑发现这个:

preg_match_all("/\((([^()]*|(?R))*)\)/", $rejoin, $matches);

very usefull from the link provided in the accepted answer 从接受的答案中提供的链接非常有用

Yes you can use this pattern 是的,你可以使用这种模式

   v                   v
 (\([^\)\(]*)+([^\)\(]*\))+
 ------------ -------------
      |            |
      |            |->match all (right)brackets to the right..
      |
      |->match all (left)brackets to the left

Demo 演示


Above pattern won't work if you have a recursive pattern like this 如果你有这样的递归模式,上面的模式将不起作用

(i want(to) (extract and also (this)) this text)
                              ------
            -------------------------

In this case you can use the recursive pattern as recommended by elclanrs 在这种情况下,您可以使用elclanrs建议的递归模式


You can also do it without without using regex by maintaining a count of number of ( and ) 你也可以通过保持()的数量计数而不使用正则表达式来做到这一点

So, assume noOfLB is the count of ( and noOfRB is the count of ) 因此,假设noOfLB是计数(并且noOfRB是计数)

  • keep on iterating each character in string and maintain the position of first ( 不断迭代每一个字符字符串,并保持第一的位置(
  • increament noOfLB if you find ( 如果找到,则增加noOfLB(
  • increment noOfRB if you find ) 如果你发现,增加noOfRB)
  • if noOfLB==noOfRB,you have found the last position of last ) 如果noOfLB == noOfRB,你已经找到了最后的最后位置)

I don't know php so I would implement above algo in c# 我不知道php所以我会在c#上面实现algo

public static string getFirstRecursivePattern(string input)
{
    int firstB=input.IndexOf("("),noOfLB=0,noOfRB=0;
    for(int i=firstB;i<input.Length && i>=0;i++)
    {
         if(input[i]=='(')noOfLB++;
         if(input[i]==')')noOfRB++;
         if(noOfLB==noOfRB)return input.Substring(firstB,i-firstB+1);
    }
    return "";
}

You will need recursive subpatterns to solve this. 您将需要递归子模式来解决此问题。 Here is the regex that should work for you: 这是应该适合您的正则表达式:

$str = 'this is the text that (i want(to) extract this text) from';
if (preg_match('/\s* \( ( (?: [^()]* | (?0) )+ ) \) /x', $str, $arr))
   var_dump($arr);

OUTPUT: OUTPUT:

string(28) "i want(to) extract this text"

You can also use substrings: 您还可以使用子字符串:

$yourString = "this is the text that (i want(to) extract this text) from";

$stringAfterFirstParen = substr( strstr( $yourString, "(" ), 1 );

$indexOfLastParen = strrpos( $stringAfterFirstParen, ")" );

$stringBetweenParens = substr( $stringAfterFirstParen, 0, $indexOfLastParen );

I think I understand the question and that is that you would like to extract "i want(to) extract this text" or something similar from something that might appear like this: this is the text that (i want(to) extract this text) from 我想我理解这个问题,那就是你想提取“我想要(提取)这个文本”或类似的东西,可能会出现这样的东西:这是(我想要(提取)这个文本的文本)来自

If so, you might find success with the following regular expression (using $text to define the variable being examined and $txt as the variable being created in the case of a match which is then stored in the array $t[]): 如果是这样,您可能会发现使用以下正则表达式成功(使用$ text定义要检查的变量,将$ txt作为匹配情况下创建的变量,然后将其存储在数组$ t []中):

if (preg_match('/\(\w+.+\)/', $text, $t)) {
$txt = $t[0];
} else {
$txt = "";
}
echo $desired=substr($txt,1,-1);

The RegEx at the root of this is: (\\w+.+) and here is the explanation of the code: RegEx的根源是:(\\ w +。+),这里是代码的解释:

  1. Match the character “(” literally «(» 匹配字符“(”字面上的«(»
  2. Match a single character that is a “word character” (letters, digits, and underscores) «\\w+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 匹配单个字符“单词字符”(字母,数字和下划线)«\\ w +»在一次和无限次之间,尽可能多次,根据需要返回(贪婪)«+»
  3. Match any single character that is not a line break character «.+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 匹配任何不是换行符的单个字符«。+»在一次和无限次之间,尽可能多次,根据需要回馈(贪婪)«+»
  4. Match the character “)” literally «)» 匹配字符“)”字面上«)»
  5. Put the text that is within the parentheses into a new variable $desired. 将括号内的文本放入新的变量$ desired中。 Display the $desired characters by selecting a substring that is reduced by one character on either end, thereby eliminating the bounding parentheses.«echo $desired=substr($txt,1-1)» 通过选择一端减少一个字符的子字符串来显示$ desired字符,从而消除边界括号。«echo $ desired = substr($ txt,1-1)»

Using the above I was able to display: i want(to) extract this text from the variable $text = this is the text that (i want(to) extract this text) from. 使用上面我能够显示:我希望(从)变量$ text中提取此文本=这是(我希望(以)提取此文本)的文本。 If desire to pull the "to" from the (to) I would suggest that you run the variable through the regex loop until there are no more ( )'s found in the expression and it returns a null value and concatenate the returned values to form the variable of interest. 如果希望从(to)中拉出“to”,我建议你通过正则表达式循环运行变量,直到在表达式中找不到更多的(),它返回一个空值并将返回的值连接到形成感兴趣的变量。

Best of luck, Steve 祝你好运,史蒂夫

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM