简体   繁体   中英

Match words between two text files

I have two text files. One contains all the English words and another is a list of usernames from a web site.

I wanted to filter out the usernames that are equal to words in the English dictionary (eg. "Envelope")

This is my current code but it returns nothing. Where am I going wrong?

import java.io.*;
import java.util.*;

class dict{
    public static void main(String args[]) throws Exception{

        Scanner kb = new Scanner(System.in);
        String name;
        String curr;

        java.io.File dictionary = new java.io.File("EnglishDict.txt");
        Scanner dictScanner = new Scanner(dictionary); 

        java.io.File list = new java.io.File("usernames.txt");
        Scanner listScanner = new Scanner(list);  

        while(dictScanner.hasNextLine()){
            curr=dictScanner.next();
            while(listScanner.hasNextLine()){
                name=listScanner.next();

                if(curr.equals(name)) System.out.println(name);
            }
        }
    }
}

Once the Scanner for usernames arrives at the end of that file, no more readings of user names take place. Theoretically (!!) you'd have to restart ("rewind") this sequential text file for comparing all usernames with the second, third, etc. word in the dictionary.

This is going to take too long (unless the number of user names is rather small).

Read the user names (presumably the smaller file) into a Set<String> and check the dictionary against this set:

Set<String> usernames = new HashSet<>();
while (listScanner.hasNextLine()) {
     usernames.add( listScanner.nextLine() );
}

while (dictScanner.hasNextLine()) {
     String curr = dictScanner.nextLine();
     if( usernames.contains( curr ){
         System.out.println( curr );
     }
}

You have to reset your listScanner after 1 iteration in your nested loop. And you have to use nextLine() instead of next() .

class dict {

    public static void main(String args[]) throws Exception {

        Scanner kb = new Scanner(System.in);
        String name;
        String curr;

        java.io.File dictionary = new java.io.File("EnglishDict.txt");
        Scanner dictScanner = new Scanner(dictionary);

        java.io.File list = new java.io.File("usernames.txt");
        Scanner listScanner = new Scanner(list);

        while (dictScanner.hasNextLine()) {
            listScanner = new Scanner(list);
            curr = dictScanner.nextLine();
            while (listScanner.hasNextLine()) {
                name = listScanner.nextLine();

                if (curr.equals(name)) {
                    System.out.println(name);
                }
            }                
            listScanner.close();
        }
    }
}

The two loops are causing the issue. During the first iteration of the outer loop, the scanner reads the entire contents of usernames.txt. During the second iteration, the scanner is already at the end, and therefore hasNextLine() is false.

Try something like:

while(dictScanner.hasNextLine()){
    curr=dictScanner.nextLine();
    listScanner = new Scanner(list);

    while(listScanner.hasNextLine()){
        name=listScanner.nextLine();
        if(curr.equals(name)) System.out.println(name);
    }
}

Which will re-initialise the second scanner in each iteration of the outer loop.

Edit: Use nextLine as with @afzalex's answer

You need to restart your scan over your username file for each iteration of your outer loop.

At the moment your inner loop immediately scans to the end of your username file and never starts from the top again in subsequent iterations of the outer loop.

You could achieve this by adding:

listScanner = new Scanner(list);

At as the last statement in your outer loop.

Note, Repeatedly scanning through one of your files like this is very inefficient. If one of your files is smallish (less than gigabytes say), consider loading it fully into a HashSet first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM