简体   繁体   中英

PHP regular expression to match words

Given a string, I want an array of strings containing words, each preceded by any non-word characters.

Example input string:

one "two" (three) -four-

The words in the string may be anything, even gibberish, with any amount of punctuation or symbols.

What I would like to see:

array:
one
 "two
" (three
) -four
-

Essentially, for each match the last thing is a word, preceded by anything left over from the previous match.

I will be using this in PHP. I have tried various combinations of preg_match_all() and preg_split(), with patterns containing many variations of "\\w", "\\b", "[^\\w]" and so on.

The Bigger Picture

How can I place a * after each word in the string for searching purposes?

If you just want to add an asterisk after each "word" you could do this:

<?php
$test = 'one "two" (three) -four-';

echo preg_replace('/(\w+)/', "$1*", $test);
?>

http://phpfiddle.org/main/code/8nr-bpb

You can use a negative lookahead to split on word boundaries, like this:

$array = preg_split( '/(?!\w)\b/', 'one "two" (three) -four-');

A print_r( $array); gives you the exact output desired:

Array
(
    [0] => one
    [1] =>  "two
    [2] => " (three
    [3] => ) -four
    [4] => -
)

Here is an example of how to find a word with regex in PHP.

<?php
    $subject = "abcdef";
    $pattern = '/^def/';
    preg_match($pattern, substr($subject, 3), $matches, PREG_OFFSET_CAPTURE);
    print_r($matches);
?>

An alternative

[^\w]*(\b\w*\b)?
----- ----------
 |        |
 |        |-> Matches a word 0 or 1 time
 |-> Matches 0 to many characters except [a-zA-Z0-9_]

You need to match!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM