简体   繁体   English

Java字数统计程序

[英]Java word count program

I am trying to make a program on word count which I have partially made and it is giving the correct result but the moment I enter space or more than one space in the string, the result of word count show wrong results because I am counting words on the basis of spaces used.我正在尝试制作一个我已经部分完成的字数计算程序,它给出了正确的结果,但是当我在字符串中输入空格或多个空格时,字数统计的结果显示错误的结果,因为我正在计算字数根据使用的空间。 I need help if there is a solution in a way that no matter how many spaces are I still get the correct result.如果有一个解决方案,无论有多少空间,我仍然可以得到正确的结果,我需要帮助。 I am mentioning the code below.我提到下面的代码。

public class CountWords 
{
    public static void main (String[] args)
    {

            System.out.println("Simple Java Word Count Program");

            String str1 = "Today is Holdiay Day";

            int wordCount = 1;

            for (int i = 0; i < str1.length(); i++) 
            {
                if (str1.charAt(i) == ' ') 
                {
                    wordCount++;
                } 
            }

            System.out.println("Word count is = " + wordCount);
    }
}
public static void main (String[] args) {

     System.out.println("Simple Java Word Count Program");

     String str1 = "Today is Holdiay Day";

     String[] wordArray = str1.trim().split("\\s+");
     int wordCount = wordArray.length;

     System.out.println("Word count is = " + wordCount);
}

The ideas is to split the string into words on any whitespace character occurring any number of times.想法是将字符串拆分为出现任意次数的任何空白字符上的单词。 The split function of the String class returns an array containing the words as its elements. String 类的 split 函数返回一个包含单词作为其元素的数组。 Printing the length of the array would yield the number of words in the string.打印数组的长度将产生字符串中的单词数。

Two routes for this.两条路线。 One way would be to use regular expressions.一种方法是使用正则表达式。 You can find out more about regular expressions here .您可以在此处找到有关正则表达式的更多信息。 A good regular expression for this would be something like "\w+" Then count the number of matches.一个好的正则表达式是这样的 "\w+" 然后计算匹配的数量。

If you don't want to go that route, you could have a boolean flag that remembers if the last character you've seen is a space.如果你不想走那条路,你可以有一个布尔标志来记住你看到的最后一个字符是否是空格。 If it is, don't count it.如果是,就不要计较了。 So the center of the loop looks like this:所以循环的中心看起来像这样:

boolean prevCharWasSpace=true;
for (int i = 0; i < str1.length(); i++) 
{
    if (str1.charAt(i) == ' ') {
        prevCharWasSpace=true;
    }
else{
        if(prevCharWasSpace) wordChar++;
        prevCharWasSpace = false;

    }
}

Update更新
Using the split technique is exactly equivalent to what's happening here, but it doesn't really explain why it works.使用拆分技术完全等同于这里发生的事情,但它并不能真正解释它为什么起作用。 If we go back to our CS theory, we want to construct a Finite State Automa (FSA) that counts words.如果我们回到我们的 CS 理论,我们想要构建一个计算单词的有限状态自动机 (FSA)。 That FSA may appear as:该 FSA 可能显示为:
在此处输入图像描述
If you look at the code, it implements this FSA exactly.如果您查看代码,它会准确地实现此 FSA。 The prevCharWasSpace keeps track of which state we're in, and the str1.charAt('i') is decideds which edge (or arrow) is being followed. prevCharWasSpace 跟踪我们所处的状态,而 str1.charAt('i') 决定正在跟随哪条边(或箭头)。 If you use the split method, a regular expression equivalent of this FSA is constructed internally, and is used to split the string into an array.如果使用 split 方法,则在内部构造一个等效于此 FSA 的正则表达式,用于将字符串拆分为数组。

Java does have StringTokenizer API and can be used for this purpose as below. Java 确实有StringTokenizer API,可用于此目的,如下所示。

String test = "This is a test app";
int countOfTokens = new StringTokenizer(test).countTokens();
System.out.println(countOfTokens);

OR或者

in a single line as below在一行如下

System.out.println(new StringTokenizer("This is a test app").countTokens());

StringTokenizer supports multiple spaces in the input string, counting only the words trimming unnecessary spaces. StringTokenizer支持输入字符串中的多个空格,只计算修剪不必要空格的单词。

System.out.println(new StringTokenizer("This    is    a test    app").countTokens());

Above line also prints 5上面的行也打印 5

You can use String.split ( read more here ) instead of charAt, you will get good results.您可以使用String.split在此处阅读更多内容)代替 charAt,您将获得良好的结果。 If you want to use charAt for some reason then try trimming the string before you count the words that way you won't have the extra space and an extra word如果您出于某种原因想使用charAt ,请在计算单词之前尝试修剪字符串,这样您就不会有额外的空间和额外的单词

My implementation, not using StringTokenizer:我的实现,不使用 StringTokenizer:

Map<String, Long> getWordCounts(List<String> sentences, int maxLength) {
    Map<String, Long> commonWordsInEventDescriptions = sentences
        .parallelStream()
        .map(sentence -> sentence.replace(".", ""))
        .map(string -> string.split(" "))
        .flatMap(Arrays::stream)
        .map(s -> s.toLowerCase())
        .filter(word -> word.length() >= 2 && word.length() <= maxLength)
        .collect(groupingBy(Function.identity(), counting()));
    }

Then, you could call it like this, as an example:然后,您可以这样称呼它,例如:

getWordCounts(list, 9).entrySet().stream()
                .filter(pair -> pair.getValue() <= 3 && pair.getValue() >= 1)
                .findFirst()
                .orElseThrow(() -> 
    new RuntimeException("No matching word found.")).getKey();

Perhaps flipping the method to return Map<Long, String> might be better.也许翻转方法以返回Map<Long, String>可能会更好。

public class wordCOunt
{
public static void main(String ar[])
{
System.out.println("Simple Java Word Count Program");

    String str1 = "Today is Holdiay Day";

    int wordCount = 1;

    for (int i = 0; i < str1.length(); i++) 
    {
        if (str1.charAt(i) == ' '&& str1.charAt(i+1)!=' ') 
        {
            wordCount++;
        } 
    }

    System.out.println("Word count is = " +(str1.length()- wordCount));
}

} }

public class wordCount
{
public static void main(String ar[]) throws Exception
{
System.out.println("Simple Java Word Count Program");


    int wordCount = 1,count=1;
 BufferedReader br = new BufferedReader(new FileReader("C:/file.txt"));
            String str2 = "", str1 = "";

            while ((str1 = br.readLine()) != null) {

                    str2 += str1;

            }


    for (int i = 0; i < str2.length(); i++) 
    {
        if (str2.charAt(i) == ' ' && str2.charAt(i+1)!=' ') 
        {
            wordCount++;
        } 


        }

    System.out.println("Word count is = " +(wordCount));
}

} }

you should make your code more generic by considering other word separators as well.. such as "," ";"您应该通过考虑其他单词分隔符来使您的代码更通用......例如“,”“;” etc.等等

public class WordCounter{
    public int count(String input){
        int count =0;
        boolean incrementCounter = false;
        for (int i=0; i<input.length(); i++){
            if (isValidWordCharacter(input.charAt(i))){
                incrementCounter = true;
            }else if (incrementCounter){
                count++;
                incrementCounter = false;
            }
        }
        if (incrementCounter) count ++;//if string ends with a valid word
        return count;
    }
    private boolean isValidWordCharacter(char c){
        //any logic that will help you identify a valid character in a word
        // you could also have a method which identifies word separators instead of this
        return (c >= 'A' && c<='Z') || (c >= 'a' && c<='z'); 
    }
}
import com.google.common.base.Optional;
import com.google.common.base.Splitter;
import com.google.common.collect.HashMultiset;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Multiset;

String str="Simple Java Word Count count Count Program";
Iterable<String> words = Splitter.on(" ").trimResults().split(str);


//google word counter       
Multiset<String> wordsMultiset = HashMultiset.create();
for (String string : words) {   
    wordsMultiset.add(string.toLowerCase());
}

Set<String> result = wordsMultiset.elementSet();
for (String string : result) {
    System.out.println(string+" X "+wordsMultiset.count(string));
}
public static int CountWords(String str){

   if(str.length() == 0)
          return 0;

   int count =0;
   for(int i=0;i< str.length();i++){


      if(str(i) == ' ')
          continue;

      if(i > 0 && str.charAt(i-1) == ' '){
        count++;
      } 

      else if(i==0 && str.charAt(i) != ' '){
       count++;
      }


   }
   return count;

}
    String data = "This world is mine";
    System.out.print(data.split("\\s+").length);

try this尝试这个

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class wordcount {
    public static void main(String[] args) {
        String s = "India is my country. I love India";
        List<String> qw = new ArrayList<String>();
        Map<String, Integer> mmm = new HashMap<String, Integer>();
        for (String sp : s.split(" ")) {
            qw.add(sp);
        }
        for (String num : qw) {
            mmm.put(num, Collections.frequency(qw, num));
        }
        System.out.println(mmm);

    }

}

To count total words Or to count total words without repeat word count计算总字数或计算总字数而不重复字数

public static void main(String[] args) {
    // TODO Auto-generated method stub
    String test = "I am trying to make make make";
    Pattern p = Pattern.compile("\\w+");
    Matcher m = p.matcher(test);
    HashSet<String> hs =  new HashSet<>();
    int i=0;
    while (m.find()) {
        i++;
        hs.add(m.group());
    }
    System.out.println("Total words Count==" + i);
    System.out.println("Count without Repetation ==" + hs.size());
}

} }

Output :输出 :

Total words Count==7总字数==7

Count without Repeatation ==5不重复计数 ==5

Not sure if there is a drawback, but this worked for me...不确定是否有缺点,但这对我有用......

    Scanner input = new Scanner(System.in);
    String userInput = input.nextLine();
    String trimmed = userInput.trim();
    int count = 1;

    for (int i = 0; i < trimmed.length(); i++) {
      if ((trimmed.charAt(i) == ' ') && (trimmed.charAt(i-1) != ' ')) {
        count++;
      }
    }

This could be as simple as using split and count variable.这可以像使用 split 和 count 变量一样简单。

public class SplitString {

    public static void main(String[] args) {
        int count=0;        
        String s1="Hi i love to code";

        for(String s:s1.split(" "))
        {
            count++;
        }
        System.out.println(count);
    }
}
    public class TotalWordsInSentence {
    public static void main(String[] args) {

        String str = "This is sample sentence";
        int NoOfWOrds = 1;

        for (int i = 0; i<str.length();i++){
            if ((str.charAt(i) == ' ') && (i!=0) && (str.charAt(i-1) != ' ')){
                NoOfWOrds++;
            }
        }
         System.out.println("Number of Words in Sentence: " + NoOfWOrds);
    }
}

In this code, There wont be any problem regarding white-space in it.在这段代码中,其中的空白不会有任何问题。
just the simple for loop.只是简单的 for 循环。 Hope this helps...希望这可以帮助...

To count specified words only like John, John99, John_John and John's only.只计算指定的单词,如 John、John99、John_John 和 John's only。 Change regex according to yourself and count the specified words only.根据自己更改正则表达式并仅计算指定的单词。

    public static int wordCount(String content) {
        int count = 0;
        String regex = "([a-zA-Z_’][0-9]*)+[\\s]*";     
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(content);
        while(matcher.find()) {
            count++;
            System.out.println(matcher.group().trim()); //If want to display the matched words
        }
        return count;
    }

class HelloWorld {类 HelloWorld {

public static void main(String[] args) {
String str = "User is in for an interview";
int counter=0;
String arrStr[] = str.split(" ");
    for (int i = 0; i< arrStr.length; i++){
        String charStr = arrStr[i];
        for(int j=0; j<charStr.length(); j++) {
            if(charStr.charAt(j) =='i') {
              counter++;
            }
       }
    }
    System.out.println("i " + counter);
}

} }

Use split(regex) method.使用split(regex)方法。 The result is an array of strings that was splited by regex .结果是一个由regex拆分的字符串数组。

String s = "Today is Holdiay Day";
System.out.println("Word count is = " + s.split(" ").length);

You can use this code.It may help you:您可以使用此代码。它可以帮助您:

public static void main (String[] args)
{

   System.out.println("Simple Java Word Count Program");

   String str1 = "Today is Holdiay Day";
   int count=0;
   String[] wCount=str1.split(" ");

   for(int i=0;i<wCount.length;i++){
        if(!wCount[i].isEmpty())
        {
            count++;
        }
   }
   System.out.println(count);
}

You need to read the file line by line and reduce the multiple occurences of the whitespaces appearing in your line to a single occurence and then count for the words.您需要逐行读取文件并将出现在您的行中的空格的多次出现减少为一次出现,然后计算单词。 Following is a sample:以下是一个示例:

public static void main(String... args) throws IOException {   

    FileInputStream fstream = new FileInputStream("c:\\test.txt");
    DataInputStream in = new DataInputStream(fstream);
    BufferedReader br = new BufferedReader(new InputStreamReader(in));
    String strLine;
    int wordcount = 0;
    while ((strLine = br.readLine()) != null)   {
        strLine = strLine.replaceAll("[\t\b]", "");
        strLine = strLine.replaceAll(" {2,}", " ");
        if (!strLine.isEmpty()){
            wordcount = wordcount + strLine.split(" ").length;
        }
    }

    System.out.println(wordcount);
    in.close();
}
 public class CountWords 
    {
        public static void main (String[] args)
        {
            System.out.println("Simple Java Word Count Program");
            String str1 = "Today is Holdiay Day";
            int wordCount = 1;
            for (int i = 0; i < str1.length(); i++) 
            {
                if (str1.charAt(i) == ' ' && str1.charAt(i+1)!=' ') 
                {
                    wordCount++;
                } 
            }
            System.out.println("Word count is = " + wordCount));
        }
    }   

This gives the correct result because if space comes twice or more then it can't increase wordcount.这给出了正确的结果,因为如果空间出现两次或更多,则它不能增加字数。 Enjoy.享受。

The full program working is:完整的程序工作是:

public class main {

    public static void main(String[] args) {

        logicCounter counter1 = new logicCounter();
        counter1.counter("I am trying to make a program on word count which I have partially made and it is giving the correct result but the moment I enter space or more than one space in the string, the result of word count show wrong results because I am counting words on the basis of spaces used. I need help if there is a solution in a way that no matter how many spaces are I still get the correct result. I am mentioning the code below.");
    }
}

public class logicCounter {

    public void counter (String str) {

        String str1 = str;
        boolean space= true;
        int i;

        for ( i = 0; i < str1.length(); i++) {

            if (str1.charAt(i) == ' ') {
                space=true;
            } else {
                i++;
            }
        }

        System.out.println("there are " + i + " letters");
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM