简体   繁体   English

创建Java程序以搜索特定Word的文件

[英]Creating a Java Program to Search a File for a Specific Word

I am just learning that language and was wondering what a more experience Java programmer would do in the following situation? 我只是在学习那种语言,并想知道Java程序员在以下情况下会有什么经验?

I would like to create a java program that will search a specified file for all instanced for a specific word. 我想创建一个java程序,它将在指定的文件中搜索特定单词的所有实例。

How would you go about this, does that Java API come with a class that provides file scanning capabilities or would i have to write my own class to do this? 您将如何解决这个问题,Java API是否提供了一个提供文件扫描功能的类,还是我必须编写自己的类才能执行此操作?

Thanks for any input, 感谢您的任何意见,
Dom. 大教堂。

The java API does offer the java.util.Scanner class which will allow you to scan across an input file. java API提供了java.util.Scanner类,它允许您扫描输入文件。

Depending on how you intend to use this, however, this might not be the best idea. 但是,根据您打算如何使用它,这可能不是最好的主意。 Is the file very large? 文件非常大吗? Are you searching only one file or are you trying to keep a database of many files and search for files within that? 您是只搜索一个文件还是试图保留许多文件的数据库并在其中搜索文件? In that case, you might want to use a more fleshed out engine such as lucene . 在这种情况下,您可能希望使用更加充实的引擎,例如lucene

Unless the file is very large, I would 除非文件非常大,否则我愿意

String text = IOUtils.toString(new FileReader(filename));
boolean foundWord = text.matches("\\b" + word+ "\\b");

To find all the text between your word you can use split() and use the length of the strings to determine the position. 要查找单词之间的所有文本,可以使用split()并使用字符串的长度来确定位置。

As others have pointed out, you could use the Scanner class. 正如其他人指出的那样,您可以使用Scanner类。

I put your question in a file, data.txt , and ran the following program: 我把你的问题放在一个文件data.txt ,并运行以下程序:

import java.io.*;
import java.util.Scanner;
import java.util.regex.MatchResult;

public class Test {
    public static void main(String[] args) throws FileNotFoundException {
        Scanner s = new Scanner(new File("data.txt"));
        while (null != s.findWithinHorizon("(?i)\\bjava\\b", 0)) {
            MatchResult mr = s.match();
            System.out.printf("Word found: %s at index %d to %d.%n", mr.group(),
                    mr.start(), mr.end());
        }
        s.close();
    }
}

The output is: 输出是:

Word found: Java at index 74 to 78.
Word found: java at index 153 to 157.
Word found: Java at index 279 to 283.

The pattern searched for, (?i)\\bjava\\b , means the following: 搜索的模式, (?i)\\bjava\\b ,表示以下内容:

  • (?i) turn on the case-insensitive switch (?i)打开不区分大小写的开关
  • \\b means a word boundry \\b表示边界词
  • java is the string searched for java是搜索的字符串
  • \\b a word boundry again. \\b一个词边界。

If the search term comes from the user, or if it for some other reason may contain special characters, I suggest you use \\Q and \\E around the string, as it quotes all characters in between, (and if you're really picky, make sure the input doesn't contain \\E itself). 如果搜索词来自用户,或者由于其他原因可能包含特殊字符,我建议你在字符串周围使用\\Q\\E ,因为它引用了两者之间的所有字符,(如果你真的很挑剔,确保输入不包含\\E本身)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM