简体   繁体   中英

(java) - Storing each word from an input file in an array of Strings

Having trouble writing a method to accomplish this, have the basic outline of the method but just need some pointers/help accomplishing this.

  public static String [] readFileAndReturnWords(String filename){
     //create array
     //read one word at a time from file and store in array
     //return the array
  }

This is what I have so far:

public static String readFileAndReturnWords(String filename){   
      String[] temp = new String[];

      //connects file
      File file = new File(filename);
      Scanner inputFile = null;

     try{

          inputFile = new Scanner(file);

         }
          //When arg is mistyped
      catch(FileNotFoundException Exception1) {
          System.out.println("File not found!");
          System.exit(0);      
     }


     //Loops through a file
    if (inputFile != null) {

    try { //I draw a blank here

I understand that some .next and .hasNext calling is in order, I just am not sure how to use these particular methods in the context of the problem.

Splitting into individual words is actually a little trickier than it might first seem - what do you split on?

If you split on spaces then fullstops, commas and other punctuation will end up attached to a word, so

quick, the lazy dog.

Would be split into:

  1. quick,
  2. the
  3. lazy
  4. dog.

Which may or may not be what you want. If you split on non-word characters then you end up splitting on apostrophes, hyphens etc, so:

  • can't, won't ->
    1. can
    2. t
    3. won
    4. t
  • no-one suspects hyper-space
    1. no
    2. one
    3. suspects
    4. hyper
    5. space

So, these solutions each have their issues. I would suggest the use of the word boundary regex matcher. It's a little more sophisticated, but has issues nonetheless - try different approaches and see what produces the output you need.

The solution I propose uses Java 8:

public static String[] readFileAndReturnWords(String filename) throws IOException {
    final Path path = Paths.get(filename);
    final Pattern pattern = Pattern.compile("\\b");

    try (final Stream<String> lines = Files.lines(path)) {
        return lines.flatMap(pattern::splitAsStream).toArray(String[]::new);
    }
}

So first you convert your String to a Path , a Java NIO representation of a file location. You then create your Pattern , this decides how to break up words.

How you simply use Files.lines to stream all the lines in the file and then Pattern.splitAsStream to turn each line into words. We use flatMap as we need to "flatten" the stream, ie each line will be a Stream<String> and we already have a Stream<String> so we end up with a Stream<Stream<String>> . flatMap is designed to take a Stream<Stream<T>> and return a Stream<T> .

Store it in an ArrayList, since you don't know how many words are stored in your file.

public class Test
{
  static ArrayList<String> words;
  public static void main(String[] args) throws FileNotFoundException
  {
    Scanner s = new Scanner(new File("Blah.txt"));
    words = new ArrayList<String>();
    while(s.hasNext ())
    {
      String token = s.next ();
      if(isAWord(token))
      {
        if(token.contains ("."))
        {
         token =  token.replace (".","");
        }
        if(token.contains (","))
        {
          token = token.replace (",", "");
        }
        //and remove other characters like braces and parenthesis 
        //since the scanner gets tokens like
        // here we are, < "are," would be a token
        //
        words.add(token);
      }

    }

  }

  private static boolean isAWord(String token)
  {
    //check if the token is a word
  }
}

It should work.

If you really want to use an array, you can just transform your ArrayList into a simple Array by

String[] wordArray = words.toArray();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM