[英]Regex which matches a string containing at least the specified characters
I have a huge dictionary which I'm trying to look through using a regex. 我有一本巨大的字典,正尝试使用正则表达式进行浏览。 What I would like to do is to find all the words in the dictionary which contain at least one occurrences of each character I provide in no particular order. 我想做的是找到词典中所有包含的单词,这些单词至少包含一个我不按特定顺序提供的字符。
Right now I can find words which only contain the specified characters but like I said that is not exactly what I want. 现在,我可以找到仅包含指定字符的单词,但是就像我说的那样,这并不是我想要的。
Example: 例:
I want at least one occurrence of each of the following characters {b, a, d} 我希望以下每个字符{b,a,d}至少出现一次
astring.matches(regex) astring.matches(regex)
I would expect words like: 我希望有这样的话:
badder, baddest, baffled 更糟糕最糟糕
Notice they all contain at least one occurence of each character but in no particular order and other characters are present in the strings. 请注意,它们每个字符至少包含一个字符,但没有特定顺序,并且字符串中还存在其他字符。
Anyone know how to do this? 有人知道怎么做吗? Other suggestions are also welcome! 也欢迎其他建议!
You can use a lookahead to do this if it's available 如果可用,可以先行执行
(?=.*b)(?=.*a)(?=.*d)
However this is quite inefficient. 但是,这效率很低。 Any reason you can't use multiple String.indexOf
checks? 有什么原因不能使用多个String.indexOf
检查?
You need a series of look-aheads: 您需要进行一系列前瞻:
^(?=.*b)(?=.*a)(?=.*d).*
which is a pain to construct. 这是很痛苦的。 However, you can ease the pain by using regex to build it: 但是,您可以使用正则表达式来减轻痛苦:
String regex = "^" + "bad".replaceAll(".", "(?=.*$0)") + ".*";
If using repeatedly with String.matches()
, you would be better to use the following code, because every call to String.matches()
compiles the regex again (there is no caching): 如果与String.matches()
重复使用,则最好使用以下代码,因为每次对String.matches()
调用都会再次编译正则表达式(不缓存):
// do this once
Pattern pattern = Pattern.compile(regex);
// reuse the pattern many times
if (pattern.matcher(input).matches())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.