简体   繁体   English

Java - 使用扫描仪读取文本文件并存储它,给我空的 ArrayList。 使用分隔符拆分单词

[英]Java - Read text file and store it using scanner, gives me empty ArrayList. Split words by using Delimeter

Task is to read an article of text from outside file and put each word (no signs) into and Array List as a separate String.任务是从外部文件中读取一篇文本,并将每个单词(无符号)作为单独的字符串放入数组列表中。 Although I´m sure my path is correct and readable(I can for example perform character count), no matter what I do my Array List of words from that article comes out as empty.尽管我确定我的路径是正确且可读的(例如,我可以执行字符计数),但无论我做什么,该文章中的单词数组列表都是空的。 I may be struggling with a way how to separate words from each other and other signs.我可能正在努力寻找如何将单词和其他符号分开的方法。 Also with storing the result of reading.还可以存储读取的结果。

I´ve been googling for the last 2 hours and reading similar answers here but no success.在过去的 2 个小时里,我一直在谷歌搜索,并在这里阅读了类似的答案,但没有成功。 So decided for the first time to ask a question.于是决定第一次问一个问题。

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
import org.w3c.dom.Text;


public class PlaceForErrors {
    public static void main(String[] args) {
        Scanner scan = null;
        try {

            
            scan = new Scanner(new File("\\Users\\marga\\Desktop\\objekt program\\oo2021\\w05_kontrolltoo1\\textHere.txt")).useDelimiter(" \\$ |[\\r\\n]+");
           
            String token1 = "";
            ArrayList<String> text = new ArrayList<String>();
            while (scan.hasNext()) {

                token1 = scan.next();
                text.add(token1);
                
            }
            
            String[] textArray = text.toArray(new String[0]);
            for(String element : textArray){
                System.out.println(element);
            }
 

            //Controlling if the ArrayList is empty and it is
            boolean tellme = text.isEmpty();
            System.out.println(tellme);
            


    
        } catch (FileNotFoundException exception) {
            System.out.println(exception);
        }
        finally{
            scan.close();
        }
    
    }
}
String[] textArray = text.toArray(new String[0]);

This line is your problem.这条线是你的问题。 You're trying to allocate the ArrayList into a String array of size 0, resulting in it appearing empty.您正在尝试将 ArrayList 分配到大小为 0 的字符串数组中,导致它显示为空。

I would modify the array declaration to initialize using the ArrayList size, like so:我将修改数组声明以使用 ArrayList 大小进行初始化,如下所示:

String[] textArray = text.toArray(new String[text.size()]);

Then you can see if your token delimiter works.然后您可以查看您的令牌分隔符是否有效。

Quick recap of your steps快速回顾您的步骤

Your program does a lot.你的程序做了很多。 I counted 9 steps:我数了9步:

  1. opens a (text) file as (text) input-stream to read from打开一个(文本)文件作为(文本)输入流来读取
  2. create a scanner for tokens from this input-stream using a regular-expression as delimiter (= tokenizer)使用正则表达式作为分隔符(= tokenizer)为此输入流创建令牌扫描器
  3. scan for and iterate over each subsequent token (if any found) using a while-loop使用 while 循环扫描并迭代每个后续标记(如果找到)
  4. each of this iteration adds the token to a list每次迭代都将令牌添加到列表中
  5. if no more tokens, then iteration ends (or never started:): converts the list to array如果没有更多标记,则迭代结束(或从未开始:):将列表转换为数组
  6. loop over each array element using a for-each-loop and print it使用 for-each 循环遍历每个数组元素并打印它
  7. check if originally collected list is empty and print true or false检查最初收集的列表是否为空并打印 true 或 false
  8. catch the exception if file was not found and print the it如果找不到文件,则捕获异常并打印它
  9. finally close any opened resources: the file that was read from最后关闭所有打开的资源:从中读取的文件

Now let's start to look for the step where something potentially could go wrong: the places for errors ️现在让我们开始寻找可能出现 go 错误的步骤:错误的地方️

Analysis: What can go wrong?分析:go会出现什么问题?

Look at the listed steps above and think of each from a what-could-go-wrong perspective, a quick check list (not correlated to the step-numbers above:):查看上面列出的步骤,并从可能出错的角度考虑每个步骤,一个快速检查列表(与上面的步骤编号无关:):

  1. Can your text-file be found, does it exist and is readable?可以找到您的文本文件,它是否存在并且可读? Yes, otherwise any IOException like FileNotFoundException would have been thrown and printed.是的,否则任何像FileNotFoundException这样的IOException都会被抛出并打印出来。

  2. Is the opened file empty with a size of 0 bytes?打开的文件是否为空,大小为 0 字节? You can check using:您可以使用以下方法进行检查:

     File textFile = new File("\\Users\\marga\\Desktop\\objekt program\\oo2021\\w05_kontrolltoo1\\textHere.txt"); System.out.println( "File size: " + textFile.length() ); // before passing the extracted file-variable to scanner scan = new Scanner( textFile ).useDelimiter(" \\$ |[\\r\\n]+");
  3. Does the delimiter/regex properly split/tokenize an example input string?分隔符/正则表达式是否正确拆分/标记示例输入字符串? Try:尝试:

     // Just a separate test: same delimiter, with test-input String delimiterRegex = " \\$ |[\\r\\n]+"; String testInput = " $ Hello\r\nWorld.\n\nBye;". // so we create a new scanner Scanner testScanner = new Scanner( testInput );useDelimiter(delimiterRegex); int tokenCount = 0. while( testScanner;hasNext() ) { tokenCount++. System.out:println("Token " + tokenCount + ". " + testScanner;next() ). } testScanner;close();

    Should print 3 tokens ( Hello , World ! , Bye. ) on 3 lines in console.应该在控制台的 3 行上打印 3 个标记( HelloWorld !Bye. )。 The special sequence $ (space-dollar-space), any \n or \r (newline or carriage-return) are omitted and have split the tokens.特殊序列$ (空格-美元-空格)、任何\n\r (换行符或回车符)都被省略并拆分了标记。

  4. We should check the list directly after the while-loop:我们应该在 while 循环之后直接检查列表:

     // Not only checking if the ArrayList is empty, but its size (is 0 if empty) System.out.println("Scanned tokens in list: " + text.size());

    If it is empty, then we neither need to fill the array, nor loop to print will start (because nothing to loop).如果它是空的,那么我们不需要填充数组,也不会开始循环打印(因为没有循环)。

Hope these explanations help you to perform the analysis (debugging/testing) yourself.希望这些解释可以帮助您自己执行分析(调试/测试)。 Let me know if it helped you to catch the issue.让我知道它是否有助于您发现问题。

Takeaway: Divide and conquer!外卖:分而治之!

Why did I count the steps, above?为什么我计算上面的步骤? Because all are potential places for errors.因为所有这些都是潜在的错误地方。 In developer jargon we also say this main method of class PlaceForErrors has many responsibilities: counted 9.在开发人员行话中,我们也说 class PlaceForErrors这个main方法有很多职责:计数为 9。

And there is a golden principle called Single Responsibility Principle (SRP).还有一个叫做单一职责原则(SRP)的黄金原则。 Put simply: It is always good to split a large problem or program (here: your large main method) into smaller pieces.简而言之:将一个大问题或程序(这里:你的大main方法)拆分成更小的部分总是好的。 These smaller pieces are easier to work with (mentally), easier to test, easier to debug if errors or unexpected happens.这些较小的部分更容易(在心理上)使用,更容易测试,如果发生错误或意外情况,也更容易调试。 Divide & conquer!分而治之!

If it works, start improving如果可行,请开始改进

You can split up this long method doing 9 steps into smaller methods.您可以将这个长 9 步的方法拆分成更小的方法。 Benefit: each method can be tested in isolation, like the testScanner .好处:每个方法都可以单独测试,就像testScanner一样。

If your program finally works as expected and your manual test went green.如果您的程序最终按预期工作并且您的手动测试变为绿色。 Then you should post the working code to the sister-site: CodeReview .然后您应该将工作代码发布到姊妹站点: CodeReview

Be curious and ask again, eg how to split up the methods, how to make testable, etc. You'll get lot's of experienced advise on how to improve it even more.好奇并再次询问,例如如何拆分方法,如何使可测试等。你会得到很多经验丰富的建议,如何进一步改进它。

Thank you for your input everyone!谢谢大家的投入!

Regarding the code, I went and checked everything step by step and on the way learned more about delimiters and scanner.关于代码,我一步一步检查了所有内容,并在此过程中了解了有关分隔符和扫描仪的更多信息。 I fixed my delimiter and everything worked just fine now.我修复了分隔符,现在一切正常。

Beside the fact that I made a newbie mistake and didn´t show the full code, as I though it would take away the attention from the main problem.除了我犯了一个新手错误并且没有显示完整代码的事实之外,因为我认为它会分散对主要问题的注意力。 I had two conflicting scanners in my main function(one I showed you and the other one was scanning again and counting letters A).我的主要功能中有两台相互冲突的扫描仪(一台我向您展示,另一台正在再次扫描并计算字母 A)。 And they both worked great separately(when one or the other is commented out), but refused to work together.他们俩单独工作都很好(当其中一个被注释掉时),但拒绝一起工作。 So I found a way to combine them and use scanner only once.所以我找到了一种方法来组合它们并且只使用一次扫描仪。 I will share my full code for reference now.我现在将分享我的完整代码以供参考。

I learned my mistake, and will provide the my full code always in the future.我知道了我的错误,并将在未来提供我的完整代码。

If someone is curious the full task was the following:如果有人好奇,完整的任务如下:

  1. Read the text from a separate file using scanner and store it in an Array List.使用扫描仪从单独的文件中读取文本并将其存储在数组列表中。
  2. Count how many letters "A" (small or big) there were and how big of % they made out of all the letters in the text.计算有多少个字母“A”(小或大),以及它们占文本中所有字母的百分比。
  3. Count how many words had one letter A, two letters A in them, etc.计算有多少个单词有一个字母A,两个字母A,等等。
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;



public class Trying {
    public static void main(String[] args) {
        Scanner scan = null;
        try {


            //SCANNING FILE AND CREATING AN ARRAYLIST
            scan = new Scanner(new File("\\Users\\marga\\Desktop\\objekt program\\oo2021\\w05_kontrolltoo1\\textHere.txt")).useDelimiter("[.,:;()?!\"\\s]+");
            
            int aCount = 0;
            int letterCount =0;
            String token1 = "";
            int wordWithAtLeastOneA = 0;
            int wordWithA = 0;
            int word1A = 0;
            int word2A = 0;
            int word3A = 0;
            int word4OrMoreA = 0;

            ArrayList<String> text = new ArrayList<String>();


            // SCANNING EVERY WORD INTO AN ARRAY LIST
            while(scan.hasNext()){
                token1 = scan.next();
                text.add(token1);
            }
            System.out.println("Amount of words in the scanned list is : " + text.size());

            //COUNTING HOW MANY LETTERS 'A' TEXT HAS
            for(String element : text){
                for (int i=0;i<=element.length()-1;i++){
                    if (element.charAt(i) == 'A' || element.charAt(i) == 'a') {
                        aCount++; 
                    } 
                }           
            }
            System.out.println("There are "+aCount+" letters 'A'. "); 
            
            //HOW MANY LETTERS IN TOTAL TEXT HAS
            for(String element : text){
                for (int i=0;i<=element.length()-1;i++){
                    letterCount++;    
                }       
            }

            //COUNTING HOW MANY WORDS HAVE 'A' LETTER IN THEM
            for(String element : text){
                for (int i=0;i<=element.length()-1;i++){
                    if (element.charAt(i) == 'A' || element.charAt(i) == 'a') {
                        wordWithAtLeastOneA++;;
                        break; 
                    } 
                }           
            }
            System.out.println("There are "+wordWithAtLeastOneA+" words that have at least one letter 'A' in them.");
            

            System.out.println();

            //COUNTING NUMBER OF WORDS THAT HAVE 1/2/3 or more 'A' LETTER IN THEM
            for(String element : text){
            wordWithA = 0;
                for (int i=0;i<=element.length()-1;i++){
                    if (element.charAt(i) == 'A' || element.charAt(i) == 'a') {
                        wordWithA++;
            if(wordWithA == 1){
                word1A++;
            }else if (wordWithA == 2){
                word2A++;
            }else if (wordWithA == 3){
                word3A++;
            }else if (wordWithA >= 4){
                word4OrMoreA++;
            } 
                    } 
                }             
            }
            System.out.println("There were "+ word1A+ " words, that had one letter 'A' in them." );
            System.out.println("There were "+ word2A+ " words, that had two letters 'A' in them." );
            System.out.println("There were "+ word3A+ " words, that had three letters 'A' in them." );
            System.out.println("There were "+ word4OrMoreA+ " words, that had 4 or more letters 'A' in them." );


           //COUNTING HOW MANY LETTERS THERE ARE IN TOTAL, COMPARE TO NUMBER OF "A" LETTERS
            
            int percentOfA = aCount*100/letterCount;
            System.out.println();
            System.out.println("The entire number of letters is "+ letterCount+" and letter 'A' makes " + percentOfA+ "% out of them or " +aCount+ " letters.");

          //  for(String element : textArray){
          //      System.out.println(element);
          //  }
    
        } catch (FileNotFoundException exception) {
            System.out.println(exception);
        }
        finally{
            scan.close();
        }
    
    }
}

And the text is:文字是:

Computer programming is an enormously flexible tool that you can use to do amazing things that are otherwise either manual and laborsome or are just impossible.计算机编程是一种非常灵活的工具,您可以使用它来完成令人惊奇的事情,否则这些事情要么是手动的、费力的,要么是不可能的。 If you are using a smartphone, a chat app or if you are unlocking your car with the push of a button, then you must know that all these things are using some kind of programming.如果您正在使用智能手机、聊天应用程序,或者您正在通过按下按钮解锁汽车,那么您必须知道所有这些东西都在使用某种编程。 You are already immersed in the programs of different types.您已经沉浸在不同类型的程序中。 In fact, software is running your life.事实上,软件正在运行你的生活。 What if you learn and start running these programs according to your will?如果您按照自己的意愿学习并开始运行这些程序会怎样?

And the output is: output 是:

There are 35 words that have at least one letter 'A' in them.

There were 35 words, that had one letter 'A' in them.
There were 3 words, that had two letters 'A' in them.
There were 0 words, that had three letters 'A' in them.
There were 0 words, that had 4 or more letters 'A' in them.

The entire number of letters is  416 and letter 'A' makes 9% out of them or 38 letters.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Java-使用扫描仪在Delimeter上拆分大型SQL文本文件(OutOfMemoryError) - Java - Splitting Large SQL Text File on Delimeter Using Scanner (OutOfMemoryError) java:读取文本文件并使用scanner类将信息存储在数组中 - java: Read text file and store the info in an array using scanner class 使用扫描仪读取文本文件并存储到ArrayList - Read text file with Scanner and store to ArrayList 使用Java中的扫描器读取单词 - Read words using scanner in Java 使用扫描仪从CSV读取Java扫描程序并忽略结束行 - Java scanner read from csv using delimeter and ignore endline 使用扫描仪从用户那里获得答案,然后将结果与arraylist进行比较。 - Using a scanner to get answer from user, to then compare result with arraylist. 如何使用Java中的扫描程序读取文本文件? - how to read a text file using scanner in Java? 尝试从文件中读取并将其拆分为单词并将这些单词存储在arrayList中 - trying to read from a file and split it into words and store those words in an arrayList 使用扫描仪将单词的出现次数及其计数存储在文件中。(Java) - Store occurences of words in a file and their count,using Scanner.( Java ) 使用扫描仪读取文本文件,为单词创建一个字符串,为数字生成一个int数组 - Using scanner to read text file and make a string for the words and an int array for the numbers Java
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM