正则表达式：查找由某些字符组成的单词

Question

I have a list of dictionary words, I would like to find any word that consists of (some or all) certain characters of a source word in any order :我有一个字典单词列表，我想查找由（部分或全部）源单词的某些字符以任意顺序组成的任何单词：

For Example:例如：

Characters (source word) to look for : stainless要查找的字符（源词）：不锈钢

Found Words : stainless, stain, net, ten, less, sail, sale, tale, tales, ants, etc.发现词：不锈钢、污点、净、十、少、帆、销售、故事、故事、蚂蚁等。

Also if a letter is found once in the source word it can't be repeated in the found word此外，如果在源词中找到一次字母，则不能在找到的词中重复

Unacceptable words to find : tent (t is repeated), tall (l is repeated) , etc.难以找到的词：tent（t 重复）、tall（l 重复）等。

Acceptable words to find : less (s is already repeated in the source word), etc.可接受的词找到：less（s 已经在源词中重复）等。

Answer 1

You could take this approach:你可以采取这种方法：

Match any sequence of characters that are in the search word, requiring that the match is a word (word-boundaries)匹配搜索词中的任何字符序列，要求匹配是一个词（词边界）
Prohibit that a certain character occurs more often than it is present in the search word, using a negative look-ahead.禁止某个字符出现的频率高于它在搜索词中出现的频率，使用否定的前瞻。 Do this for every character that is in the search word.对搜索词中的每个字符执行此操作。

For the given example the regular expression would be:对于给定的示例，正则表达式将是：

(?!(\S*s){4}|(\S*t){2}|(\S*a){2}|(\S*i){2}|(\S*n){2}|(\S*l){2}|(\S*e){2})\b[stainless]+\b

The biggest part of the pattern deals with the negative look-ahead.该模式的最大部分处理的是负前瞻。 For example:例如：

(\S*s){4} would match four times an 's' in a single word. (\S*s){4}将匹配一个单词中的四次“s”。
(?! | ) places these patterns as different options in a negative look-ahead so that none of them should match. (?! | )将这些模式作为不同的选项放置在负前瞻中，这样它们都不应该匹配。

Automation自动化

It is clear that making such a regular expression for a given word needs some work, so that is where you could use some automation.很明显，为给定单词制作这样的正则表达式需要一些工作，因此您可以使用一些自动化。 Notepad++ cannot help with that, but in a programming environment it is possible. Notepad++ 对此无能为力，但在编程环境中是可能的。 Here is a little snippet in JavaScript that will give you the regular expression that corresponds to a given search word:这是 JavaScript 中的一个小片段，它将为您提供与给定搜索词相对应的正则表达式：

 function regClassEscape(s) { // Escape "[" and "^" and "-": return s.replace(/[\]^-]/g, "\\$&"); } function buildRegex(searchWord) { // get frequency of each letter: let freq = {}; for (let ch of searchWord) { ch = regClassEscape(ch); freq[ch] = (freq[ch] ?? 0) + 1; } // Produce negative options (too many occurrences) const forbidden = Object.entries(freq).map(([ch, count]) => "(\\S*[" + ch + "]){" + (count + 1) + "}" ).join("|"); // Produce character set const allowed = Object.keys(freq).join(""); return "(?!" + forbidden + ")\\b[" + allowed + "]+\\b"; } // I/O management const [input, output] = document.querySelectorAll("input,div"); input.addEventListener("input", refresh); function refresh() { if (/\s/.test(input.value)) { output.textContent = "Input should have no white space!"; } else { output.textContent = buildRegex(input.value); } } refresh();

 input { width: 100% }

 Search word:<br> <input value="stainless"> Regular expression: <div></div>

正则表达式：查找由某些字符组成的单词

问题描述

1 个解决方案

解决方案1
0 2022-07-23 07:14:20

Automation自动化

正则表达式：查找由某些字符组成的单词

问题描述

1 个解决方案

解决方案1 0 2022-07-23 07:14:20

Automation自动化

解决方案1
0 2022-07-23 07:14:20