简体   繁体   English

在java中匹配数组与字符串

[英]match array against string in java

I'm reading a file using bufferedreader, so lets say i have 我正在使用bufferedreader读取文件,所以我想说

line = br.readLine();

I want to check if this line contains one of many possible strings (which i have in an array). 我想检查这一行是否包含许多可能的字符串之一(我在一个数组中)。 I would like to be able to write something like: 我希望能够写出如下内容:

while (!line.matches(stringArray) { // not sure how to write this conditional
  do something here;
  br.readLine();
}

I'm fairly new to programming and Java, am I going about this the right way? 我是编程和Java的新手,我是否正确地采用了这种方式?

Copy all values into a Set<String> and then use contains() : 将所有值复制到Set<String> ,然后使用contains()

Set<String> set = new HashSet<String> (Arrays.asList (stringArray));
while (!set.contains(line)) { ... }

[EDIT] If you want to find out if a part of the line contains a string from the set, you have to loop over the set. [编辑]如果要查明该行的一部分是否包含该集合中的字符串,则必须循环该集合。 Replace set.contains(line) with a call to: set.contains(line)替换为:

public boolean matches(Set<String> set, String line) {
    for (String check: set) {
        if (line.contains(check)) return true;
    }
    return false;
}

Adjust the check accordingly when you use regexp or a more complex method for matching. 使用正则表达式或更复杂的匹配方法时,请相应地调整检查。

[EDIT2] A third option is to concatenate the elements in the array in a huge regexp with | [EDIT2]第三个选项是连接该阵列中的元素在一个巨大的正则表达式| :

Pattern p = Pattern.compile("str1|str2|str3");

while (!p.matcher(line).find()) { // or matches for a whole-string match
    ...
}

This can be more cheap if you have many elements in the array since the regexp code will optimize the matching process. 如果数组中有许多元素,这可能会更便宜,因为正则表达式代码将优化匹配过程。

It depends on what stringArray is. 这取决于stringArray是什么。 If it's a Collection then fine. 如果它是一个Collection然后很好。 If it's a true array, you should make it a Collection . 如果它是一个真正的数组,你应该把它变成一个Collection The Collection interface has a method called contains() that will determine if a given Object is in the Collection . Collection接口有一个名为contains()的方法,它将确定给定的Object是否在Collection

Simple way to turn an array into a Collection : 将数组转换为Collection简单方法:

String tokens[] = { ... }
List<String> list = Arrays.asList(tokens);

The problem with a List is that lookup is expensive (technically linear or O(n) ). List的问题在于查找很昂贵(技术上是线性的或O(n) )。 A better bet is to use a Set , which is unordered but has near-constant ( O(1) ) lookup. 更好的选择是使用Set ,它是无序的但具有近似常数( O(1) )查找。 You can construct one like this: 你可以构建一个这样的:

From a Collection : 来自Collection

Set<String> set = new HashSet<String>(stringList);

From an array: 从数组:

Set<String> set = new HashSet<String>(Arrays.asList(stringArray));

and then set.contains(line) will be a cheap operation. 然后set.contains(line)将是一个廉价的操作。

Edit: Ok, I think your question wasn't clear. 编辑:好的,我认为你的问题不明确。 You want to see if the line contains any of the words in the array. 您想查看该行是否包含数组中的任何单词。 What you want then is something like this: 你想要的是这样的:

BufferedReader in = null;
Set<String> words = ... // construct this as per above
try {
  in = ...
  while ((String line = in.readLine()) != null) {
    for (String word : words) {
      if (line.contains(word)) [
        // do whatever
      }
    }
  }
} catch (Exception e) {
  e.printStackTrace();
} finally {
  if (in != null) { try { in.close(); } catch (Exception e) { } }
}

This is quite a crude check, which is used surprisingly open and tends to give annoying false positives on words like "scrap". 这是一个非常粗略的检查,使用得非常开放,往往会给像“废料”这样的词语带来恼人的误报。 For a more sophisticated solution you probably have to use regular expression and look for word boundaries: 对于更复杂的解决方案,您可能必须使用正则表达式并查找单词边界:

Pattern p = Pattern.compile("(?<=\\b)" + word + "(?=\b)");
Matcher m = p.matcher(line);
if (m.find() {
  // word found
}

You will probably want to do this more efficiently (like not compiling the pattern with every line) but that's the basic tool to use. 您可能希望更有效地执行此操作(例如不使用每行编译模式),但这是使用的基本工具。

Using the String.matches(regex) function, what about creating a regular expression that matches any one of the strings in the string array? 使用String.matches(regex)函数,创建一个匹配字符串数组中任何一个字符串的正则表达式怎么样? Something like 就像是

String regex = "*(";
for(int i; i < array.length-1; ++i)
  regex += array[i] + "|";
regex += array[array.length] + ")*";
while( line.matches(regex) )
{
  //. . . 
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM