简体   繁体   中英

searching in text file specific words using java

I've a huge text file, I'd like to search for specific words and print three or more then this number OF THE WORDS AFTER IT so far I have done this

public static void main(String[] args) {
    String fileName = "C:\\Users\\Mishari\\Desktop\\Mesh.txt";        
    String line = null;
    try {            
        FileReader fileReader = 
            new FileReader(fileName);

        BufferedReader bufferedReader = 
            new BufferedReader(fileReader);

        while((line = bufferedReader.readLine()) != null) {                
            System.out.println(line);
        }   

        bufferedReader.close();         
    } catch(FileNotFoundException ex) {
        System.out.println(
            "Unable to open file '" + 
            fileName + "'");                
    } catch(IOException ex) {
        System.out.println(
            "Error reading file '" 
            + fileName + "'");                  
    }  
}

It's only for printing the file can you advise me what's the best way of doing it.

You can look for the index of word in line using this method.

int index = line.indexOf(word);
  • If the index is -1 then that word does not exist.
  • If it exist than takes the substring of line starting from that index till the end of line.

     String nextWords = line.substring(index); 
  • Now use String[] temp = nextWords.split(" ") to get all the words in that substring.

    while((line = bufferedReader.readLine()) != null) {
        System.out.println(line);
        if (line.contains("YOUR_SPECIFIC_WORDS")) { //do what you need here }
    }   

By the sounds of it what you appear to be looking for is a basic Find & Replace All mechanism for each file line that is read in from file. In other words, if the current file line that is read happens to contain the Word or phrase you would like to add words after then replace that found word with the very same word plus the other words you want to add. In a sense it would be something like this:

String line = "This is a file line.";
String find = "file";  // word to find in line
String replaceWith = "file (plus this stuff)"; // the phrase to change the found word to.
line = line.replace(find, replaceWith);  // Replace any found words
System.out.println(line);

The console output would be:

This is a file (plus this stuff) line.

The main thing here though is that you only want to deal with actual words and not the same phrase within another word, for example the word "and" and the word "sand" . You can clearly see that the characters that make up the word 'and' is also located in the word 'sand' and therefore it too would be changed with the above example code. The String.contains() method also locates strings this way. In most cases this is undesirable if you want to specifically deal with whole words only so a simple solution would be to use a Regular Expression (RegEx) with the String.replaceAll() method. Using your own code it would look something like this:

String fileName = "C:\\Users\\Mishari\\Desktop\\Mesh.txt";
String findPhrase = "and"; //Word or phrase to find and replace
String replaceWith = findPhrase + " (adding this)";  // The text used for the replacement.
boolean ignoreLetterCase = false; // Change to true to ignore letter case
String line = "";

try {
    FileReader fileReader = new FileReader(fileName);
    BufferedReader bufferedReader = new BufferedReader(fileReader);

    while ((line = bufferedReader.readLine()) != null) {
        if (ignoreLetterCase) {
            line = line.toLowerCase();
            findPhrase = findPhrase.toLowerCase();
        }
        if (line.contains(findPhrase)) {
            line = line.replaceAll("\\b(" + findPhrase + ")\\b", replaceWith);
        }
        System.out.println(line);
    }
    bufferedReader.close();
} catch (FileNotFoundException ex) {
    System.out.println("Unable to open file: '" + fileName + "'");
} catch (IOException ex) {
    System.out.println("Error reading file: '" + fileName + "'");
}

You will of course notice the escaped \\b word boundary Meta Characters within the regular expression used in the String.replaceAll() method specifically in the line:

line = line.replaceAll("\\b(" + findPhrase + ")\\b", replaceWith);

This allows us to deal with whole words only.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM