I can't seem to figure out how to get this print out all the words including the duplicates

Question

I am trying to get this to print out all the words that are on a text file in ascending order. When I run it, it prints out in ascending order, but it only prints one occurrence of the word. I want it to print out every occurrence of the word(duplicates wanted). I am not sure what I'm doing wrong. Also I would like it to only print out the words and not the punctuation marks that are in the text file. I know I need to use the "split", just not sure how to properly use it. I've worked with it once before but can not remember how to apply it here.

This is the code I have so far:

public class DisplayingWords {

public static void main(String[] args) throws 
        FileNotFoundException, IOException 
{
    Scanner ci = new Scanner(System.in);
    System.out.print("Please enter a text file to open: ");
    String filename = ci.next();
    System.out.println("");

    File file = new File(filename);
    BufferedReader br = new BufferedReader(new FileReader(file));

    StringBuilder sb = new StringBuilder();
    String str;
    while((str = br.readLine())!= null)

    {
/*
 * This is where i seem to be having my problems.
 * I have only ever used a split once before and can not 
 * remember how to properly use it. 
 * i am trying to get the print out to avoid printing out 
 * all the punctuation marks and have only the words
 */

      //  String[] str = str.split("[ \n\t\r.,;:!?(){}]");
        str.split("[ \n\t\r.,;:!?(){}]");
        sb.append(str);
        sb.append(" ");
        System.out.println(str);
    }

    ArrayList<String> text = new ArrayList<>();
    StringTokenizer st = new StringTokenizer(sb.toString().toLowerCase());
            while(st.hasMoreTokens()) 
            {
                String s = st.nextToken();
                text.add(s);
            }

            System.out.println("\n" + "Words Printed out in Ascending "
                                + "(alphabetical) order: " + "\n");

            HashSet<String> set = new HashSet<>(text);
            List<String> arrayList = new ArrayList<>(set);
            Collections.sort(arrayList);
            for (Object ob : arrayList)
                System.out.println("\t" + ob.toString());
    }
}

Answer 1

your duplicates are probably being stripped out here

HashSet<String> set = new HashSet<>(text);

a set generally does not contain duplicates, so I'd just sort your text array list

Collections.sort(text);
for (Object ob : text)
    System.out.println("\t" + ob.toString());

Answer 2

The problem is here:

HashSet<String> set = new HashSet<>(text);

Set doesn't contain duplicates.

You should instead use following code:

    //HashSet<String> set = new HashSet<>(text);
    List<String> arrayList = new ArrayList<>(text);
    Collections.sort(arrayList);

Also for split method I would suggest you to use:

s.split("[\\s\\.,;:\\?!]+");

For example consider the code given below:

String s = "Abcdef;Ad; country hahahahah?           ad! \n alsj;d;lajfa try.... wait, which wish work";
String sp[] = s.split("[\\s\\.,;:\\?!]+");
for (String sr : sp )
{
    System.out.println(sr);
}

Its output is as follows:

Abcdef
Ad
country
hahahahah
ad
alsj
d
lajfa
try
wait
which
wish
work

I can't seem to figure out how to get this print out all the words including the duplicates

Question

2 answers

solution1
1 ACCPTED 2013-04-17 17:54:07

solution2
1 2013-04-17 17:55:01

I can't seem to figure out how to get this print out all the words including the duplicates

Question

2 answers

solution1 1 ACCPTED 2013-04-17 17:54:07

solution2 1 2013-04-17 17:55:01

solution1
1 ACCPTED 2013-04-17 17:54:07

solution2
1 2013-04-17 17:55:01