简体   繁体   中英

Q re Java Programme to read from a text file, create an arraylist, reformat, and output to another file

As a java student (beginner) I've just completed a task to write a program to perform the following tasks:

  1. Read the content of a text file;

  2. Pick out all words within the text, replacing all upper case letters with lower case ones;

  3. Produce a dictionary of the words appearing in the text (ie, a list of the words in alphabetical order, one word per line, without duplication);
  4. Output the dictionary to another file.

We were advised to have a single class containing our main method, and use the ArrayList, Scanner, PrintWriter and FileReader classes.

I managed to do the below, which works, but it has raised some questions/fundamental gaps in my knowledge.

I believe that based on the principles of good OO programming practice, I should break the below down into a series of methods that have a single purpose, and then call those methods. However, I really struggle doing this. Can anyone please elaborate/advise a way to improve on that specific point? Or correct me if I'm wrong?

public class Main
{

    private static ArrayList<String> dictionary; 
    private static ArrayList<String> tempdictionary;

    public static void main() throws FileNotFoundException
    {
        Scanner inFile = new Scanner(new FileReader("H:\\csc8001\\results.txt"));
        dictionary = new ArrayList <>();
        tempdictionary = new ArrayList <>();

        while (inFile.hasNext()) { 
            dictionary.add(inFile.next().toLowerCase().replaceAll("[^a-zA-Z ]", ""));
        }
        inFile.close();

        PrintWriter outFile = new PrintWriter("H:\\csc8001\\results1.txt");
        int index = 0;
        for (String a : dictionary)
        {
            if (!tempdictionary.contains(a)){ 
                tempdictionary.add(a);
            } else {
                index++;
            }
        }

        Collections.sort(tempdictionary);
        for (int i=0; i<tempdictionary.size(); i++)
            outFile.println(tempdictionary.get(i));
        outFile.flush();
        outFile.close();
    }

}

As others have said, there is absolutely nothing wrong with your code as it stands for the task you've been given. But you have expressed an interest in OO principles, so I have a few thoughts that might be useful. Note - the below is very open to the critique that it is taking too general approach to a specific task - there is a perennial debate about this, and many subscribe to the "You ain't gonna need it" school.

Why would an object oriented approach be appropriate?

An object oriented approach is going to be useful if you are likely to need many copies of something of the same class, but with different attribute values. In your task as specified, this is not the case.

However, one could imagine that you might want to, say, make a dictionary from (an)other file(s) either in a subsequent task or write a program using several within the same program.

What constitutes an object in this scenario?

In this task, you are being asked to create a dictionary. A dictionary could be an object. You could approach the task from this starting point. For instance:

public class Dictionary
{
}

What does a dictionary hold? A list of alphabetically sorted words. As others have said, a good choice would be to use a SortedSet for this (get used to looking at API documentation aka javadocs , they are your essential friends), but you've been told to use ArrayList , so that's what we'll do.

public class Dictionary
{
    ArrayList<String> words;
}

you could go on to add methods to do each of your tasks. Personally, I might have a constructor taking a String filename. I've put the reading of the file directly into this - you might equally call a private method (eg parseFile() ) that takes the file and extracts the words.

public Dictionary(String inputFilename) throws FileNotFoundException 
{
    Scanner inFile = new Scanner(new FileReader(inputFilename));
    words = new ArrayList <String>();

    while (inFile.hasNext()) { 
        words.add(inFile.next().toLowerCase().replaceAll("[^a-zA-Z ]", ""));
    }
    inFile.close();

    removeDuplicates();

    Collections.sort(words);
}

I'd then add a removeDuplicates() method (which I've called above) - there are various ways to do this. You could use a temporary ArrayList as you have done - I'll give another example that I suspect is faster.

private void removeDuplicates()
{
    HashSet<String> dupSet = new HashSet<String>();
    dupSet.addAll(words);
    words.clear();
    words.addAll(dupSet);
}

Finally - add a writeToFile(String outFilename) method

public void writeToFile(String outFilename) throws FileNotFoundException
{
    PrintWriter outFile = new PrintWriter(outFilename);
    for (int i=0; i<words.size(); i++)
        outFile.println(words.get(i));
    outFile.flush(); // strictly redundant
    outFile.close();
}

You've been told to put main in your class, so do that. It can be short now:

    public static void main(String[] args) throws FileNotFoundException
    {
         d = new Dictionary("H:\\csc8001\\results.txt");
         d.outputToFile("H:\\csc8001\\results1.txt");
    }

What have you gained?

You now have a Dictionary class that you can import into any project and use as above. If next week's assignment is "modify your dictionary program to make a dictionary for all 100 files in this directory", you're laughing. You could also add functions to the Dictionary class (eg mergeWith(Dictionary d2) ) whilst being fairly confident you won't break the existing functions or any programs that use those functions.

What have you lost?

15 minutes reading this answer. Time to code some methods that may never be used. The very concise nature of your original program, which was essentially a script.

Minor stylistic things

It's pretty unusual to call your class Main . Nothing wrong as such, but even if you didn't go for the dictionary object approach above, I'd rename it - it's still runnable as long as it has the main method in it.

You don't really need tempdictionary - you can just 'uniquify' and sort dictionary - thus avoiding retention of the unsorted list with duplicates (and the associated memory it uses)

It looks a bit unusual to initialise your PrintWriter, then do some processing (make the list unique and sort), then output. Tend towards opening the file as near as possible to where you output and then close it again as soon as you can.

Overall, your concept is correct; think of the DRY principle (Don't repeat yourself). If you have to write that code more than once, stick it in a function and call it instead.

However, in this case, writing all of the code in the main is "OK" since there isn't too much for the program to do - your code already fits the DRY requirements. If you had to read multiple files, then you should definitely put the code into functions instead of in main like that.

If you really wanted to break it down, you could do something along the lines of putting the while (inFile.hasNext()) block inside it's own function, where you pass in the scanner object as a parameter.

I don't see anything particularly worthy of making into a separate method. But I do have some comments on the code in general, in no particular order:

  1. Why are you using static class variables instead of local variables?

  2. What is the purpose of the index variable?

  3. Why do you sort/filter your collection right in between opening and writing to the output file?

  4. Closing a writer will flush it as well, so there's no need to call flush() explicitly.

  5. Why not use a SortedSet such as TreeSet as your collection, and you can avoid the sorting and filtering altogether? For example:

     Scanner inFile = new Scanner(new FileReader("H:\\\\csc8001\\\\results.txt")); SortedSet<String> dictionary = new TreeSet<>(); while (inFile.hasNext()) { dictionary.add(inFile.next().toLowerCase().replaceAll("[^a-zA-Z ]", "")); } inFile.close(); PrintWriter outFile = new PrintWriter("H:\\\\csc8001\\\\results1.txt"); for (String s : dictionary) outFile.println(s); outFile.close();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM