简体   繁体   中英

Java - Read text file and store it using scanner, gives me empty ArrayList. Split words by using Delimeter

Task is to read an article of text from outside file and put each word (no signs) into and Array List as a separate String. Although I´m sure my path is correct and readable(I can for example perform character count), no matter what I do my Array List of words from that article comes out as empty. I may be struggling with a way how to separate words from each other and other signs. Also with storing the result of reading.

I´ve been googling for the last 2 hours and reading similar answers here but no success. So decided for the first time to ask a question.

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
import org.w3c.dom.Text;


public class PlaceForErrors {
    public static void main(String[] args) {
        Scanner scan = null;
        try {

            
            scan = new Scanner(new File("\\Users\\marga\\Desktop\\objekt program\\oo2021\\w05_kontrolltoo1\\textHere.txt")).useDelimiter(" \\$ |[\\r\\n]+");
           
            String token1 = "";
            ArrayList<String> text = new ArrayList<String>();
            while (scan.hasNext()) {

                token1 = scan.next();
                text.add(token1);
                
            }
            
            String[] textArray = text.toArray(new String[0]);
            for(String element : textArray){
                System.out.println(element);
            }
 

            //Controlling if the ArrayList is empty and it is
            boolean tellme = text.isEmpty();
            System.out.println(tellme);
            


    
        } catch (FileNotFoundException exception) {
            System.out.println(exception);
        }
        finally{
            scan.close();
        }
    
    }
}
String[] textArray = text.toArray(new String[0]);

This line is your problem. You're trying to allocate the ArrayList into a String array of size 0, resulting in it appearing empty.

I would modify the array declaration to initialize using the ArrayList size, like so:

String[] textArray = text.toArray(new String[text.size()]);

Then you can see if your token delimiter works.

Quick recap of your steps

Your program does a lot. I counted 9 steps:

  1. opens a (text) file as (text) input-stream to read from
  2. create a scanner for tokens from this input-stream using a regular-expression as delimiter (= tokenizer)
  3. scan for and iterate over each subsequent token (if any found) using a while-loop
  4. each of this iteration adds the token to a list
  5. if no more tokens, then iteration ends (or never started:): converts the list to array
  6. loop over each array element using a for-each-loop and print it
  7. check if originally collected list is empty and print true or false
  8. catch the exception if file was not found and print the it
  9. finally close any opened resources: the file that was read from

Now let's start to look for the step where something potentially could go wrong: the places for errors ️

Analysis: What can go wrong?

Look at the listed steps above and think of each from a what-could-go-wrong perspective, a quick check list (not correlated to the step-numbers above:):

  1. Can your text-file be found, does it exist and is readable? Yes, otherwise any IOException like FileNotFoundException would have been thrown and printed.

  2. Is the opened file empty with a size of 0 bytes? You can check using:

     File textFile = new File("\\Users\\marga\\Desktop\\objekt program\\oo2021\\w05_kontrolltoo1\\textHere.txt"); System.out.println( "File size: " + textFile.length() ); // before passing the extracted file-variable to scanner scan = new Scanner( textFile ).useDelimiter(" \\$ |[\\r\\n]+");
  3. Does the delimiter/regex properly split/tokenize an example input string? Try:

     // Just a separate test: same delimiter, with test-input String delimiterRegex = " \\$ |[\\r\\n]+"; String testInput = " $ Hello\r\nWorld.\n\nBye;". // so we create a new scanner Scanner testScanner = new Scanner( testInput );useDelimiter(delimiterRegex); int tokenCount = 0. while( testScanner;hasNext() ) { tokenCount++. System.out:println("Token " + tokenCount + ". " + testScanner;next() ). } testScanner;close();

    Should print 3 tokens ( Hello , World ! , Bye. ) on 3 lines in console. The special sequence $ (space-dollar-space), any \n or \r (newline or carriage-return) are omitted and have split the tokens.

  4. We should check the list directly after the while-loop:

     // Not only checking if the ArrayList is empty, but its size (is 0 if empty) System.out.println("Scanned tokens in list: " + text.size());

    If it is empty, then we neither need to fill the array, nor loop to print will start (because nothing to loop).

Hope these explanations help you to perform the analysis (debugging/testing) yourself. Let me know if it helped you to catch the issue.

Takeaway: Divide and conquer!

Why did I count the steps, above? Because all are potential places for errors. In developer jargon we also say this main method of class PlaceForErrors has many responsibilities: counted 9.

And there is a golden principle called Single Responsibility Principle (SRP). Put simply: It is always good to split a large problem or program (here: your large main method) into smaller pieces. These smaller pieces are easier to work with (mentally), easier to test, easier to debug if errors or unexpected happens. Divide & conquer!

If it works, start improving

You can split up this long method doing 9 steps into smaller methods. Benefit: each method can be tested in isolation, like the testScanner .

If your program finally works as expected and your manual test went green. Then you should post the working code to the sister-site: CodeReview .

Be curious and ask again, eg how to split up the methods, how to make testable, etc. You'll get lot's of experienced advise on how to improve it even more.

Thank you for your input everyone!

Regarding the code, I went and checked everything step by step and on the way learned more about delimiters and scanner. I fixed my delimiter and everything worked just fine now.

Beside the fact that I made a newbie mistake and didn´t show the full code, as I though it would take away the attention from the main problem. I had two conflicting scanners in my main function(one I showed you and the other one was scanning again and counting letters A). And they both worked great separately(when one or the other is commented out), but refused to work together. So I found a way to combine them and use scanner only once. I will share my full code for reference now.

I learned my mistake, and will provide the my full code always in the future.

If someone is curious the full task was the following:

  1. Read the text from a separate file using scanner and store it in an Array List.
  2. Count how many letters "A" (small or big) there were and how big of % they made out of all the letters in the text.
  3. Count how many words had one letter A, two letters A in them, etc.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;



public class Trying {
    public static void main(String[] args) {
        Scanner scan = null;
        try {


            //SCANNING FILE AND CREATING AN ARRAYLIST
            scan = new Scanner(new File("\\Users\\marga\\Desktop\\objekt program\\oo2021\\w05_kontrolltoo1\\textHere.txt")).useDelimiter("[.,:;()?!\"\\s]+");
            
            int aCount = 0;
            int letterCount =0;
            String token1 = "";
            int wordWithAtLeastOneA = 0;
            int wordWithA = 0;
            int word1A = 0;
            int word2A = 0;
            int word3A = 0;
            int word4OrMoreA = 0;

            ArrayList<String> text = new ArrayList<String>();


            // SCANNING EVERY WORD INTO AN ARRAY LIST
            while(scan.hasNext()){
                token1 = scan.next();
                text.add(token1);
            }
            System.out.println("Amount of words in the scanned list is : " + text.size());

            //COUNTING HOW MANY LETTERS 'A' TEXT HAS
            for(String element : text){
                for (int i=0;i<=element.length()-1;i++){
                    if (element.charAt(i) == 'A' || element.charAt(i) == 'a') {
                        aCount++; 
                    } 
                }           
            }
            System.out.println("There are "+aCount+" letters 'A'. "); 
            
            //HOW MANY LETTERS IN TOTAL TEXT HAS
            for(String element : text){
                for (int i=0;i<=element.length()-1;i++){
                    letterCount++;    
                }       
            }

            //COUNTING HOW MANY WORDS HAVE 'A' LETTER IN THEM
            for(String element : text){
                for (int i=0;i<=element.length()-1;i++){
                    if (element.charAt(i) == 'A' || element.charAt(i) == 'a') {
                        wordWithAtLeastOneA++;;
                        break; 
                    } 
                }           
            }
            System.out.println("There are "+wordWithAtLeastOneA+" words that have at least one letter 'A' in them.");
            

            System.out.println();

            //COUNTING NUMBER OF WORDS THAT HAVE 1/2/3 or more 'A' LETTER IN THEM
            for(String element : text){
            wordWithA = 0;
                for (int i=0;i<=element.length()-1;i++){
                    if (element.charAt(i) == 'A' || element.charAt(i) == 'a') {
                        wordWithA++;
            if(wordWithA == 1){
                word1A++;
            }else if (wordWithA == 2){
                word2A++;
            }else if (wordWithA == 3){
                word3A++;
            }else if (wordWithA >= 4){
                word4OrMoreA++;
            } 
                    } 
                }             
            }
            System.out.println("There were "+ word1A+ " words, that had one letter 'A' in them." );
            System.out.println("There were "+ word2A+ " words, that had two letters 'A' in them." );
            System.out.println("There were "+ word3A+ " words, that had three letters 'A' in them." );
            System.out.println("There were "+ word4OrMoreA+ " words, that had 4 or more letters 'A' in them." );


           //COUNTING HOW MANY LETTERS THERE ARE IN TOTAL, COMPARE TO NUMBER OF "A" LETTERS
            
            int percentOfA = aCount*100/letterCount;
            System.out.println();
            System.out.println("The entire number of letters is "+ letterCount+" and letter 'A' makes " + percentOfA+ "% out of them or " +aCount+ " letters.");

          //  for(String element : textArray){
          //      System.out.println(element);
          //  }
    
        } catch (FileNotFoundException exception) {
            System.out.println(exception);
        }
        finally{
            scan.close();
        }
    
    }
}

And the text is:

Computer programming is an enormously flexible tool that you can use to do amazing things that are otherwise either manual and laborsome or are just impossible. If you are using a smartphone, a chat app or if you are unlocking your car with the push of a button, then you must know that all these things are using some kind of programming. You are already immersed in the programs of different types. In fact, software is running your life. What if you learn and start running these programs according to your will?

And the output is:

There are 35 words that have at least one letter 'A' in them.

There were 35 words, that had one letter 'A' in them.
There were 3 words, that had two letters 'A' in them.
There were 0 words, that had three letters 'A' in them.
There were 0 words, that had 4 or more letters 'A' in them.

The entire number of letters is  416 and letter 'A' makes 9% out of them or 38 letters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM