I have implemented code to count number of: - chars - words - lines - bytes in text file. But how to count dictionary size: number of different words used in this file? Also, how to implement iterator which can iterate over only letters? (Ignore whitespaces)
public class wc {
public static void main(String[] args) throws IOException {
//counters
int charsCount = 0;
int wordsCount = 0;
int linesCount = 0;
Scanner in = null;
File file = new File("Sample.txt");
try(Scanner scanner = new Scanner(new BufferedReader(new FileReader(file)))){
while (scanner.hasNextLine()) {
String tmpStr = scanner.nextLine();
if (!tmpStr.equalsIgnoreCase("")) {
String replaceAll = tmpStr.replaceAll("\\s+", "");
charsCount += replaceAll.length();
wordsCount += tmpStr.split("\\s+").length;
}
++linesCount;
}
System.out.println("# of chars: " + charsCount);
System.out.println("# of words: " + wordsCount);
System.out.println("# of lines: " + linesCount);
System.out.println("# of bytes: " + file.length());
}
}
}
To get unique words and their counts:
1. Split your obtained line from file into a string array
2. Store the contents of this string array in a Hashset
3. Repeat steps 1 and 2 till end of file
4. Get unique words and their count from the Hashset
I prefer posting logic and pseudo code as it will help OP to learn something by solving posted problem.
hey @JeyKey you can use HashMap. Here I using Iterator too. You can check out this code.
public class CountUniqueWords {
public static void main(String args[]) throws FileNotFoundException {
File f = new File("File Name");
ArrayList arr=new ArrayList();
HashMap<String, Integer> listOfWords = new HashMap<String, Integer>();
Scanner in = new Scanner(f);
int i=0;
while(in.hasNext())
{
String s=in.next();
//System.out.println(s);
arr.add(s);
}
Iterator itr=arr.iterator();
while(itr.hasNext())
{i++;
listOfWords.put((String) itr.next(), i);
//System.out.println(listOfWords); //for Printing the words
}
Set<Object> uniqueValues = new HashSet<Object>(listOfWords.values());
System.out.println("The number of unique words: "+uniqueValues.size());
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.