简体   繁体   English

Java-词频

[英]Java - Word Frequency

I've created a Java program in Eclipse. 我已经在Eclipse中创建了一个Java程序。 The program counts the frequency of each word. 该程序计算每个单词的频率。 For example if the user entered 'I went to the shop' the program would produce the output '1 1 1 2' that is 1 word of length 1 ('I') 1 word of length 2 ('to') 1 word of length 3 ('the') and 2 words of length 4 ('went' , 'shop'). 例如,如果用户输入“我去了商店”,程序将生成输出“ 1 1 1 2”,即长度为1的1个字(“ I”),长度为2(“至”)的1个字长度3('the')和2个长度4('went','shop')的单词。

These are the results I'm getting. 这些是我得到的结果。 I don't want the output with a 0 to be shown. 我不希望显示0的输出。 How can I hide these and only have the results with 1,2,3,4,5 shown. 如何隐藏这些内容,只显示1,2,3,4,5的结果。

The cat sat on the mat
words[1]=0
words[2]=1
words[3]=5
words[4]=0
words[5]=0


  import java.util.Scanner;
 import java.io.*;

 public class mallinson_Liam_8
{

 public static void main(String[] args) throws Exception
 {

    Scanner scan = new Scanner(new File("body.txt"));

    while(scan.hasNext())
    {

        String s;
        s = scan.nextLine();
        String input = s;
        String strippedInput = input.replaceAll("\\W", " ");

        System.out.println("" + strippedInput);

        String[] strings = strippedInput.split(" ");
        int[] counts = new int[6];
        int total = 0;
        String text = null;

            for (String str : strings)
                if (str.length() < counts.length)
                    counts[str.length()] += 1;
            for (String s1 : strings)
                total += s1.length();   

            for (int i = 1; i < counts.length; i++){  
                System.out.println("words["+ i + "]="+counts[i]);
        StringBuilder sb = new StringBuilder(i).append(i + " letter words: ");
            for (int j = 1; j <= counts[i]; j++) {




    }}}}}

I know you asked for Java, but just for comparison, here is how I'd do it in Scala: 我知道您要求使用Java,但是为了进行比较,这是我在Scala中的处理方式:

val s = "I went to the shop"
val sizes = s.split("\\W+").groupBy(_.length).mapValues(_.size)
// sizes = Map(2 -> 1, 4 -> 2, 1 -> 1, 3 -> 1)

val sortedSizes = sizes.toSeq.sorted.map(_._2)
// sortedSizes = ArrayBuffer(1, 1, 1, 2)

println(sortedSizes.mkString(" "))
// outputs: 1 1 1 2

Simply add a check before you print... 只需在打印前添加支票即可...

for (int i = 1; i < counts.length; i++) {
    if (counts[i] > 0) { //filter out 0-count lengths
        System.out.println("words["+ i + "]="+counts[i]);
    }

Add an if-statement that checks if the number of words of length 'i' is equal to 0. 添加一个if语句,该语句检查长度为'i'的单词数是否等于0。

If that is true, don't show it, if it is not, show it. 如果是这样,请不要显示,否则请不要显示。

for (int i =0; i < counts.length; i++) {
 if (counts[i] != 0) {
  System.out.println("words[" + i + "]="+counts[i]); 
 }
}

Edit: 编辑:

bbill beat me to it. bbill击败了我。 Our answers both work. 我们的答案都有效。

I'd use the Java8 streaming API. 我将使用Java8流API。

See my example: 看我的例子:

// import java.nio.file.*;
import java.util.*;
import java.util.stream.Collectors;

public class CharacterCount {
    public static void main(String[] args) {

        // define input
        String input = "I went to the shop";
        // String input = new String(Files.readAllBytes(Paths.get("body.txt")));

        // calculate output
        String output =

                // split input by whitespaces and other non-word-characters
                Arrays.stream(input.split("\\W+"))

                // group words by length of word
                .collect(Collectors.groupingBy(String::length))

                // iterate over each group of words
                .values().stream()

                // count the words for this group
                .map(List::size)

                // join all values into one, space separated string
                .map(Object::toString).collect(Collectors.joining(" "));

        // print output to console
        System.out.println(output);
    }
}

It outputs: 它输出:

1 1 1 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM