简体   繁体   中英

Trim substrings within PHP regular expressions

I have a string which may contain a pattern like:

LINK([anchor text],[link])

What I would like to do is transform this expression into a HTML link:

<a href="link">anchor text</a>

At the moment, I'm performing the replacement with the following PHP snippet:

$string = 'LINK(  some anchor text    ,   http://mydomain.com  )';
$search = '/LINK\s*\(\s*(.+),\s*([^\s]+)\s*\)/';
$replace = '<a href="$2">$1</a>';
preg_replace($search, $replace, $string);

The problem I'm facing are the spaces after the anchor text. Fortunately, in HTML multiple spaces are interpreted as a single space, but in this example I would however show a link with a (underlined) annoying space. Is there any way to trim this anchor text? I can't treat it as the "link" substring, since it may contain spaces.

Assuming that the anchor text cannot contain commas or more than 1 space in a row, you could perhaps use:

LINK\s*\(\s*([^\s,]+(?:\s[^\s,]+)*)\s*,\s*(\S+)\s*\)

regex101 demo

Instead of .+ , I'm using [^\\s,]+(?:\\s[^\\s,]+)* which will match one word, and more words separated by space (where a word is a series of non-space characters with at least one character).

Also changed your negated class [^\\s] which appears later on to \\S .

You could make the relevant quantifiers lazy , that they don't eat up the white-spaces before , or ) :

'/LINK\(\s*(.+?)\s*,\s*([^\s]+?)\s*\)/'

by adding an ? after + .

Test

What you can do in this case is change the first group to group lazily.

$search = '/LINK\s*\(\s*(.+),\s*([^\s]+)\s*\)/';

Can be changed to:

$search = '/LINK\s*\(\s*(.+?)\s*,\s*([^\s]+)\s*\)/';

Notice the question mark after the plus. This tells the program to match it using the least number of characters.

In this case, the laziest it can match is a string, followed by any number of spaces, then a comma.

In the original case, it would be matching greedily . This means that it will try to match the maximum number of characters possible, causing the .+ to match all characters up to the comma.

Here is a regex101 of the code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM