简体   繁体   English

修剪PHP正则表达式中的子字符串

[英]Trim substrings within PHP regular expressions

I have a string which may contain a pattern like: 我有一个可能包含如下模式的字符串:

LINK([anchor text],[link])

What I would like to do is transform this expression into a HTML link: 我想做的就是将此表达式转换为HTML链接:

<a href="link">anchor text</a>

At the moment, I'm performing the replacement with the following PHP snippet: 目前,我正在使用以下PHP代码段进行替换:

$string = 'LINK(  some anchor text    ,   http://mydomain.com  )';
$search = '/LINK\s*\(\s*(.+),\s*([^\s]+)\s*\)/';
$replace = '<a href="$2">$1</a>';
preg_replace($search, $replace, $string);

The problem I'm facing are the spaces after the anchor text. 我面临的问题是锚文本后的空格。 Fortunately, in HTML multiple spaces are interpreted as a single space, but in this example I would however show a link with a (underlined) annoying space. 幸运的是,在HTML中,多个空格被解释为单个空格,但是在此示例中,我将显示带有(带下划线的)烦人空格的链接。 Is there any way to trim this anchor text? 有什么办法可以修剪此锚文本? I can't treat it as the "link" substring, since it may contain spaces. 我不能将其视为“链接”子字符串,因为它可能包含空格。

Assuming that the anchor text cannot contain commas or more than 1 space in a row, you could perhaps use: 假设锚文本不能连续包含逗号或超过1个空格,则可以使用:

LINK\s*\(\s*([^\s,]+(?:\s[^\s,]+)*)\s*,\s*(\S+)\s*\)

regex101 demo regex101演示

Instead of .+ , I'm using [^\\s,]+(?:\\s[^\\s,]+)* which will match one word, and more words separated by space (where a word is a series of non-space characters with at least one character). 而不是.+ ,我使用的是[^\\s,]+(?:\\s[^\\s,]+)* ,它将匹配一个单词,以及更多由空格分隔的单词(其中一个单词是一系列的具有至少一个字符的非空格字符)。

Also changed your negated class [^\\s] which appears later on to \\S . 也将否定的类[^\\s]更改为\\S ,稍后再出现。

You could make the relevant quantifiers lazy , that they don't eat up the white-spaces before , or ) : 您可以使相关的量词变得懒惰 ,以免它们浪费了,)之前的空白:

'/LINK\(\s*(.+?)\s*,\s*([^\s]+?)\s*\)/'

by adding an ? 通过添加一个? after + . +之后。

Test 测试

What you can do in this case is change the first group to group lazily. 在这种情况下,您可以做的是将第一个组更改为惰性组。

$search = '/LINK\s*\(\s*(.+),\s*([^\s]+)\s*\)/';

Can be changed to: 可以更改为:

$search = '/LINK\s*\(\s*(.+?)\s*,\s*([^\s]+)\s*\)/';

Notice the question mark after the plus. 注意加号后的问号。 This tells the program to match it using the least number of characters. 这告诉程序使用最少的字符来匹配它。

In this case, the laziest it can match is a string, followed by any number of spaces, then a comma. 在这种情况下,它可以匹配的最懒惰是一个字符串,后跟任意多个空格,然后是逗号。

In the original case, it would be matching greedily . 在原始情况下,它会贪婪地匹配。 This means that it will try to match the maximum number of characters possible, causing the .+ to match all characters up to the comma. 这意味着它将尝试匹配可能的最大字符数,从而使.+匹配所有字符,直到逗号为止。

Here is a regex101 of the code. 这是代码的regex101

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM