简体   繁体   English

创建仅包含某些字符的String []

[英]Create String[] containing only certain characters

I am trying to create a String[] which contains only words that comprise of certain characters. 我正在尝试创建一个String[] ,它仅包含由某些字符组成的单词。 For example I have a dictionary containing a number of words like so: 例如,我有一本包含许多单词的字典,如下所示:

arm army art as at attack attempt attention attraction authority automatic awake baby back bad bag balance 手臂军队艺术攻击尝试注意注意权限自动唤醒婴儿背部坏包平衡

I want to narrow the list down so that it only contains words with the characters a , b and g . 我想缩小列表的范围,使其仅包含具有字符abg单词。 Therefore the list should only contain the word 'bag' in this example. 因此,在此示例中,列表应仅包含单词“ bag”。 Currently I am trying to do this using regexes but having never used them before I can't seem to get it to work. 目前,我正在尝试使用正则表达式来执行此操作,但是在我似乎无法使其正常工作之前从未使用过它们。 Here is my code: 这是我的代码:

public class LetterJugglingMain {
public static void main(String[] args) {
    String dictFile = "/Users/simonrhillary/Desktop/Dictionary(3).txt";
    fileReader fr = new fileReader();
    fr.openFile(dictFile);
    String[] dictionary = fr.fileToArray();
    String regx = "able";
    String[] newDict = createListOfValidWords(dictionary, regx);
    printArray(newDict);
}

public static String[] createListOfValidWords(String[] d, String regex){
    List<String> narrowed = new ArrayList<String>();
    for(int i = 0; i<d.length; i++){
        if(d[i].matches(regex)){
            narrowed.add(d[i]);
            System.out.println("added " + d[i]);
        }
    }
    String[] narrowArray = narrowed.toArray(new String[0]);
    return narrowArray;
}

however the array returned is always empty unless the String regex is the exact word! 但是,除非String正则表达式是确切的单词,否则返回的数组始终为空! Any ideas? 有任何想法吗? I can post more code if needed...I think I must be trying to initialise the regex wrong. 如果需要,我可以发布更多代码...我认为我必须尝试初始化正则表达式错误。 The narrowed down list must contain ONLY the characters from the regex. 缩小的列表必须仅包含正则表达式中的字符。

The regex able will match only the string "able" . 正则表达式able匹配字符串"able" However, if you want a regular expression to match either character of a , b , l or e , the regex you're looking for is [able] (in brackets). 但是,如果希望正则表达式匹配able任一字符,则要查找的正则表达式是[able] (在方括号中)。 If you want words containing several such characters, add a + for repeating the pattern: [able]+ . 如果您想要包含几个这样的字符的单词,请添加一个+以重复该模式: [able]+

Frankly, I'm not an expert in regexes, but I don't think it's the best tool to do what you want. 坦白说,我不是正则表达式方面的专家,但我认为这不是执行所需操作的最佳工具。 I would use a method like the following: 我将使用如下方法:

public boolean containsAll(String s, Set<Character> chars) {
    Set<Character> copy = new HashSet<Character>();
    for (int i = 0; i < s.length() && copy.size() < chars.size(); i++) {
        char c = s.charAt(i);
        if (chars.contains(c)) {
            copy.add(c);
        }
    }
    return copy.size() == chars.size();
}

The OP wants words that contain every character. OP希望包含每个字符的单词。 Not just one of them. 不只是其中之一。 And other characters are not a problem. 和其他字符不是问题。

If this is the case, I think the simiplest way would be to loop through the entire string, character by character, and check to see if it contains all of the characters you want. 如果是这种情况,我认为最简单的方法是逐字符遍历整个字符串,并检查其是否包含所需的所有字符。 Keep flags to check and see if every character has been found. 保留标志以检查是否找到了每个字符。

If this isn't the case.... : 如果不是这种情况......:

Try using the regex: 尝试使用正则表达式:

^[able]+$

Here's what it does: 这是它的作用:

^ matches the beginning of the string and $ matches the end of the string. ^匹配字符串的开头, $匹配字符串的结尾。 This makes sure that you're not getting a partial match. 这样可以确保您不会部分匹配。

[able] matches the characters you want the string to consist of, in this case a , b , l , and e . [able]匹配您希望字符串组成的字符,在这种情况下为able + Makes sure that there are 1 or more of these characters in the string. +确保字符串中有1个或多个这些字符。

Note: This regex will match a string that contains these 4 letters. 注意:此正则表达式将匹配包含这4个字母的字符串。 For example, it will match: 例如,它将匹配:

able, albe, aeble, aaaabbblllleeee 能够,即使是,aeble,aaaabbblllleeee

and will not match 并且不匹配

qable, treatable, and abled. 可靠,可治疗且能力强。

A sample regex that filters out words that contains at least one occurrence of all characters in a set. 样本正则表达式可过滤掉包含至少一个字符集的所有字符的单词。 This will match any English word (case-insensitive) that contains at least one occurrence of all the characters a, b, g: 这将匹配所有包含至少一个出现在所有字符a,b,g中的英语单词(不区分大小写):

(?i)(?=.*a)(?=.*b)(?=.*g)[a-z]+

Example of strings that match would be bag , baggy , grab . 匹配的字符串示例有bagbaggygrab

Example of strings that don't match would be big , argument , nothing . 不匹配的字符串的例子bigargument nothing

The (?i) means turns on case-insensitive flag. (?i)表示不区分大小写的标志。

You need to append as many (?=.*<character>) as the number of characters in the set, for each of the characters. 您需要为每个字符附加与集合中的字符数一样多的(?=.*<character>)

I assume a word only contains English alphabet, so I specify [az] . 我假设一个单词仅包含英文字母,所以我指定[az] Specify more if you need space, hyphen, etc. 如果需要空格,连字符等,请指定更多。

I assume matches(String regex) method in String class, so I omitted the ^ and $ . 我假设使用String类中的matches(String regex)方法,所以省略了^$

The performance may be bad , since in the worst case (the characters are found at the end of the words), I think that the regex engine may go through the string for around n times where n is the number of characters in the set. 性能可能很差 ,因为在最坏的情况下(字符位于单词的结尾),我认为正则表达式引擎可能会在字符串中遍历n次左右,其中n是集合中的字符数。 It may not be an actual concern at all, since the words are very short, but if it turns out that this is a bottleneck, you may consider doing simple looping. 因为单词很短,所以可能根本就不用担心,但是如果发现这是一个瓶颈,则可以考虑进行简单的循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM