简体   繁体   English

PHP正则表达式的子字符串匹配,并且正则表达式并非始终有效

[英]PHP Substring of Regular Expression match,and regular expression not always working

I am trying to create an html parser like BBCode. 我正在尝试创建类似BBCode的html解析器。 For example I want to parse items from html text with the following format: .....html..... [I]Item1[/I].....html....[I]Item2[/I]...... 例如,我想用以下格式从html文本中解析项目: .....html..... [I]Item1[/I].....html....[I]Item2[/I]......
So I am using a regular expression to get the [I]XXXXX[/I] I also want the regex to return only the Item1 to avoid str_replace . 因此,我正在使用正则表达式获取[I]XXXXX[/I]我还希望正则表达式仅返回Item1以避免str_replace At the moment I am using str_replace [I] with "" and [/I] with "" to get the Item1. 目前,我将str_replace [I]""[/I]""以获取Item1。 The problem is that the regular expression is not always working. 问题在于正则表达式并不总是有效。
I am using the code bellow: 我正在使用下面的代码:

$pattern="/\[I]([^\[].)+\[\/I]/m";
preg_match_all($pattern,$string,$out,PREG_SET_ORDER);
foreach($out as $i)
{
    $temp=$i[0];
    echo "Found!";
    $i[0]=str_replace("[I]","",$i[0]);
    $i[0]=str_replace("[/I]","",$i[0]);
    ......
}

My regular expression means: Starts with [I] continues with any character except [ (To avoid [I] [I] [/I] [/I] ) and ends with [/I] . 我的正则表达式的意思是:以[I]开头并以[I] [以避免[I] [I] [/I] [/I] )以外的任何字符继续,并以[/I]结束。 Some strings are failing such as aaaaa and others like aaa aa are found! 有些字符串失败,例如aaaaa而另一些则发现aaa aa Maybe there is a better way to create such an html parser? 也许有更好的方法来创建这样的HTML解析器?
Thank you! 谢谢!

Edit: Ok, I found the solution, but I can't understand why this doesn't work! 编辑:好的,我找到了解决方案,但是我不明白为什么这行不通! The solution was $pattern='#\\[i\\](.*?)\\[/i\\]#is' but whats the difference? 解决方案是$pattern='#\\[i\\](.*?)\\[/i\\]#is'但是有什么区别?

Edit 2: Raider was correct the main problem was in ([^\\[.)+] . 编辑2:Raider是正确的,主要问题是在([^\\[.)+] This will create the language [I](a)^2n[/I] so it will match [I]aa[/I] , but not [I]aaaaa[/I] ! 这将创建语言[I](a)^2n[/I]因此它将匹配[I]aa[/I] ,但不会匹配[I]aaaaa[/I]

Try to use something like this: 尝试使用如下形式:

$parsed_str = '[I]Item1[/I].....html....[I]Item2[/I].....';
preg_match_all('~\[I\]([^\[.]+?)\[\/I\]~i', $parsed_str, $result);
print_r($result[1]);

The same results is given by: 通过以下方式得出相同的结果:

preg_match_all('~\[I\]([^\[].+?)\[\/I\]~i', $parsed_str, $result);

I think your subpattern ([^\\[].)+ is the problem. 我认为您的子模式([^\\[].)+是问题。 Try ([^\\[]+) 试试([^\\[]+)

You problem is in line 你的问题是对的

$temp=$i[0];

Index 0 contains the entire matched pattern. 索引0包含整个匹配的模式。 Instead you need to use index 1 - the first parenthesised part of the regexp: 相反,您需要使用索引1-正则表达式的第一个括号部分:

$temp = $i[1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM