繁体   English   中英

统计单词在 java 中出现的次数

[英]Counting number of occurrences of word in java

我想计算源字符串中特定单词的出现次数。 假设 src="thisisamangoterrthisismangorightthis?" word="this" 所以我所做的是,首先在 src 中搜索单词的索引。 它位于索引 0。现在我正在从这个索引位置提取部分到 src 的末尾。 即,现在 src="isamangoterrthisismangorightthis?" 并再次搜索单词。 但是我得到数组超出范围的异常。

public static int countOccur(String s1, String s2)
{
    int ans=0;
    int len1=s1.length();
    int len2=s2.length();
    System.out.println("Lengths:"+len1+" " +len2);

    while(s1.contains(s2))
    {
        ans++;
        int tmpInd=s1.indexOf(s2);
        System.out.println("Now Index is:"+tmpInd);
        if((tmpInd+len2)<len1){
            s1=s1.substring(tmpInd+len2, len1);
            System.out.println("Now s1 is:"+s1);
        }
        else
            break;
    }
    return ans;

}

当使用抛出ArrayIndexOutOfBoundsException的方法时,检查边界始终是一个好主意。 参见String#substring

IndexOutOfBoundsException-如果beginIndex为负,或者endIndex大于此String对象的长度,或者beginIndex大于endIndex


您应该涵盖所有情况:

if(tmpInd + len2 >= s1.length() || len1 >= s1.length() || ... ) {
    //Not good
}

或者,更好的是,您应该首先考虑避免这种情况的逻辑。

尝试使用indexOf() ,它将为您处理边界等:

public static int countOccurrences(final String haystack, final String needle)
{
    int index = 0;
    int ret = 0;
    while (true) {
        index = haystack.indexOf(needle, index);
        if (index == -1)
            return ret;
        ret++;
    }

    // Not reached
    throw new IllegalStateException("How on earth did I get there??");
}

而不是在您的String上使用substring ,请使用此方法

public int indexOf(int ch, int fromIndex)

然后只要检查结果是否为-1

您可以使用replace解决问题

String s = "thisisamangoterrthisismangorightthis?";
String newS = s.replaceAll("this","");
int count = (s.length() - newS.length()) / 4;
import java.io.*;
import java.util.*;

public class WordCount
{
public static class Word implements Comparable<Word>
{
    String word;
    int count;

    @Override
    public int hashCode()
    {
        return word.hashCode();
    }

    @Override
    public boolean equals(Object obj)
    {
        return word.equals(((Word)obj).word);
    }

    @Override
    public int compareTo(Word b)
    {
        return b.count - count;
    }
}


    public  static void findWordcounts(File input)throws Exception
    {
       long time = System.currentTimeMillis();

    Map<String, Word> countMap = new HashMap<String, Word>();

    BufferedReader reader = new BufferedReader(new InputStreamReader(new    FileInputStream(input)));
    String line;
    while ((line = reader.readLine()) != null) {
        String[] words = line.split("[^A-ZÅÄÖa-zåäö]+");
        for (String word : words) {
            if ("".equals(word)) {
                continue;
            }

            Word wordObj = countMap.get(word);
            if (wordObj == null) {
                wordObj = new Word();
                wordObj.word = word;
                wordObj.count = 0;
                countMap.put(word, wordObj);
            }

            wordObj.count++;
        }
    }

    reader.close();

    SortedSet<Word> sortedWords = new TreeSet<Word>(countMap.values());
    int i = 0;
    for (Word word : sortedWords) {
        if (i > 10) {
            break;
        }

        System.out.println("Word \t "+ word.word+"\t Count \t"+word.count);

        i++;
    }

    time = System.currentTimeMillis() - time;

    System.out.println("Completed in " + time + " ms"); 
    }


public static void main(String[] args)throws Exception
{
   findWordcounts(new File("./don.txt"));               
}
}

试试这个来计算字符串中的单词,

private static int countingWord(String value, String findWord)
    {
        int counter = 0;
        while (value.contains(findWord))
        {
            int index = value.indexOf(findWord);
            value = value.substring(index + findWord.length(), value.length());
            counter++;
        }
        return counter;
    }

我知道我的回答为时已晚,但将来仍有可能帮助某人

import java.util.LinkedList;
import java.util.Scanner;

public class Main {
    public static void main(String[] args) {

        // the following three lines of codes could be anything
        // but you have to create an alternative for these
        // the string where you want to search a specific word/string to count
        Scanner userInput = new Scanner(System.in); // optional

        String sentence = userInput.nextLine().trim(); // optional

        Scanner readSentence = new Scanner(sentence); // optional
        
        
        // all the words will be stored in both of these LinkedList
        LinkedList<String> words = new LinkedList<String>(); // stores all the words
        LinkedList<String> noDuplicates = new LinkedList<String>(); // stores all the words but w/o duplicates

        // the program for storing process
        for (int index = 0; readSentence.hasNext(); index++) {

            words.add(readSentence.next()); // adds each word from string

            if (!noDuplicates.contains(words.get(index))) {
                noDuplicates.add(words.get(index)); // adds each word but not duplicates
            }
        }
        
        
        // the program for searching duplicates and counting number of occurences
        for (String word : noDuplicates) {

            int counter = 0; // increments each time the program encountered a duplicate

            for (int index = 0; index < words.size(); index++) {

                if (word.equals(words.get(index))) { // the comparing process
                    counter++; // increments each time if the above condition was true
                }
            }
            if (counter > 1) {
                // finally, the printing method
                System.out.println("Word: \"" + word + "\" occurred " + counter + " times.");
            }
        }
    }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM