PHP正则表达式负前瞻

Question

I have a dictionary of 4 letter words. 我有4个字母的字典。 I want to write a regex to go through the dictionary and matches all words given a set of letters. 我想编写一个正则表达式来浏览字典，并匹配给定一组字母的所有单词。

Suppose I pass in a,b,l,l . 假设我通过a,b,l,l 。 I want to find all words with exactly those letters. 我想找到所有带有这些字母的单词。

I know I could do /[abl]{4}/ but that will also match words with 2 a's or 2 b's. 我知道我可以做/[abl]{4}/但这也可以匹配带有2 a或2 b的单词。

I feel like I need to do a negative look ahead. 我觉得我需要对未来保持负面看法。 Something like: 就像是：

[l|(ab)(?!\1)]{4}

The attempt here is that I want a word that starts with l or a or b and not followed by a or b. 这里的尝试是我想要一个以l或a或b开头而不是a或b开头的单词。

Answer 1

First thing you need to anchor your pattern to describe where the string begins and ends: 首先需要锚定模式以描述字符串的开始和结束位置：

for a whole string ( ^ start of the string, $ end of the string) : 对于整个字符串（字符串的^开头， $结尾） ：

^[abl]{4}$

or to find words in a larger text, use word-boundaries (limit between a character from [A-Za-z0-9_] and something else) : 或要查找较大文本中的单词，请使用单词边界（ [A-Za-z0-9_]的字符与其他字符之间的限制） ：

\b[abl]{4}\b

Then you need to say that l must occur two times (or that a and b must occurs only one time, but it's more complicated): 然后，您需要说l必须出现两次（或者a和b只能出现一次，但是更复杂）：

for a whole string: 对于整个字符串：

^(?=.*l.*l)[abl]{4}$

in a larger text: 较大的文字：

\b(?=\w*l\w*l)[abl]{4}\b

To avoid two a or b, you can use an other lookahead: 为了避免两个a或b，可以使用另一个前瞻：

for a whole string: 对于整个字符串：

^(?=.*l.*l)(?=l*al*b|l*bl*a)[abl]{4}$

in a larger text: 较大的文字：

\b(?=\w*l\w*l)(?=l*al*b|l*bl*a)[abl]{4}\b

About [l|(ab)(?!\\1)] : in a character class, special regex characters or sequence of characters loose their special meaning and all characters are seen as literals. 关于[l|(ab)(?!\\1)] ：在字符类中，特殊的正则表达式字符或字符序列失去其特殊含义，所有字符均视为文字。 So [l|(ab)(?!\\1)] is the same than [)(!|?1abl] for example. (Since \\1 is an unknown escape sequence in a character class, the backslash is ignored.) 因此， [l|(ab)(?!\\1)]与[)(!|?1abl]相同（由于\\1是字符类中的未知转义序列，因此反斜杠将被忽略。）

Note that with several constraints the pattern becomes quickly ugly. 请注意，在几个约束条件下，模式会很快变得难看。 You should consider an other approach that consists to catch all words with \\b[abl]{4}\\b and to filter them in a second time (using count_chars for example). 您应该考虑另一种方法，该方法包括用\\b[abl]{4}\\b捕获所有单词并再次对其进行过滤（例如，使用count_chars ）。

$str ='abll labl ball aabl lblabla 1234';

$dict = 'abll';
$count = count_chars($dict);

$result = [];
if (preg_match_all('~\b[abl]{4}\b~', $str, $matches)) {
    $result = array_filter($matches[0], function ($i) use ($count) {
        return $count == count_chars($i);
    });
}

print_r($result);

Answer 2

If you want specify letters dynamically and then generate regexp that will do all work - this will be a very expensive work. 如果要动态指定字母，然后生成将完成所有工作的regexp-这将是一项非常昂贵的工作。

Simple approach: you can generate simple regexp like /^[abl]{4}$/ , get all words from dictionary that match him and then validate each word separately - check letters quantity. 简单方法：您可以生成简单的正则表达式，例如/^[abl]{4}$/ ，从字典中获取与他匹配的所有单词，然后分别验证每个单词-检查字母数量。

More efficient approach: you can index your words in dictionary with sorted list of letters like this: 更有效的方法：您可以使用以下字母排序列表在字典中为单词建立索引：

word: apple | index: aelpp

word: pale | index: aelp

And so on. 等等。 To get all words from list of letters you simply should sort this letters and find exact match with "index" value. 要从字母列表中获取所有单词，您只需对这些字母进行排序并找到具有“ index”值的完全匹配项。

Answer 3

Edit: So for 47 letters it would be 编辑：所以对于47个字母，它将是

\\b(?:((?(1)(?!))l1)|((?(2)(?!))l2)|...|((?(47)(?!))l47)){47}\\b

Letters can be duplicates, say 4 a's and 15 r's (but no more), etc ... 字母可以是重复的，例如4 a和15 r（但不能再重复），等等...
( immune to permutations ) （ 不受排列的影响 ）

To match out of order items only once, 要仅匹配一次故障订单项，
use a conditional to allow each item to match once, 使用条件允许每个项目匹配一次，
but no more. 但没有更多。

It's not complicated, and is immune to permutations. 它并不复杂，并且不受排列的影响。

Works every time ! 每次都能工作！

\\b(?:((?(1)(?!))a)|((?(2)(?!))b)|((?(3)(?!))l)|((?(4)(?!))l)){4}\\b

Expanded 扩展

 \b 
 (?:
      (                             # (1)
           (?(1)(?!))
           a 
      )

   |  
      (                             # (2)
           (?(2)(?!))
           b 
      )
   |  
      (                             # (3)
           (?(3)(?!))
           l 
      )
   |  
      (                             # (4)
           (?(4)(?!))
           l 
      )
 ){4}
 \b

PHP正则表达式负前瞻

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-04-05 21:07:12

解决方案2
0 2016-04-05 21:30:55

解决方案3
0

PHP正则表达式负前瞻

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-04-05 21:07:12

解决方案2 0 2016-04-05 21:30:55

解决方案3 0

解决方案1
2 已采纳 2016-04-05 21:07:12

解决方案2
0 2016-04-05 21:30:55

解决方案3
0