简体   繁体   English

PHP正则表达式匹配单词

[英]PHP regular expression to match words

Given a string, I want an array of strings containing words, each preceded by any non-word characters.给定一个字符串,我想要一个包含单词的字符串数组,每个字符串前面都有任何非单词字符。

Example input string:示例输入字符串:

one "two" (three) -four-

The words in the string may be anything, even gibberish, with any amount of punctuation or symbols.字符串中的单词可以是任何内容,甚至是乱码,带有任意数量的标点符号或符号。

What I would like to see:我想看到的:

array:
one
 "two
" (three
) -four
-

Essentially, for each match the last thing is a word, preceded by anything left over from the previous match.本质上,对于每场比赛,最后一件事是一个单词,前面是前一场比赛剩下的任何东西。

I will be using this in PHP.我将在 PHP 中使用它。 I have tried various combinations of preg_match_all() and preg_split(), with patterns containing many variations of "\\w", "\\b", "[^\\w]" and so on.我尝试了 preg_match_all() 和 preg_split() 的各种组合,其中的模式包含“\\w”、“\\b”、“[^\\w]”等的许多变体。

The Bigger Picture更大的图景

How can I place a * after each word in the string for searching purposes?如何在字符串中的每个单词后放置 * 以进行搜索?

If you just want to add an asterisk after each "word" you could do this:如果您只想在每个“单词”后添加一个星号,您可以这样做:

<?php
$test = 'one "two" (three) -four-';

echo preg_replace('/(\w+)/', "$1*", $test);
?>

http://phpfiddle.org/main/code/8nr-bpb http://phpfiddle.org/main/code/8nr-bpb

You can use a negative lookahead to split on word boundaries, like this:您可以使用负前瞻在单词边界上进行拆分,如下所示:

$array = preg_split( '/(?!\w)\b/', 'one "two" (three) -four-');

A print_r( $array);一个print_r( $array); gives you the exact output desired:为您提供所需的确切输出:

Array
(
    [0] => one
    [1] =>  "two
    [2] => " (three
    [3] => ) -four
    [4] => -
)

Here is an example of how to find a word with regex in PHP.这是一个如何在 PHP 中使用正则表达式查找单词的示例。

<?php
    $subject = "abcdef";
    $pattern = '/^def/';
    preg_match($pattern, substr($subject, 3), $matches, PREG_OFFSET_CAPTURE);
    print_r($matches);
?>

An alternative替代

[^\w]*(\b\w*\b)?
----- ----------
 |        |
 |        |-> Matches a word 0 or 1 time
 |-> Matches 0 to many characters except [a-zA-Z0-9_]

You need to match!你需要匹配!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM