我如何计算有多少个单词，并忽略字符串中的相同单词？ (使用方法)

Question

这里的代码只显示了它们有多少个单词，我如何忽略相同的单词？ 例如，“很久很久以前，我还记得”，将返回 8 而不是 9。

我希望它是一种方法，它接受一个 String 类型的参数并返回一个 int 值。 而且我只允许使用基础知识，所以没有 hash 键或 hash 设置和推进的东西。

  public static int mostCommonLetter(String s){

                    int wordCount = 0;

                    boolean word = false;
                    int endOfLine = s.length() - 1;

                    for (int i = 0; i < s.length(); i++) {
                       
                        if (Character.isLetter(s.charAt(i)) && i != endOfLine) {
                            word = true;
                          
                        } else if (!Character.isLetter(s.charAt(i)) && word) {
                            wordCount++;
                            word = false;
                          
                        } else if (Character.isLetter(s.charAt(i)) && i == endOfLine) {
                            wordCount++;
                        }
                    }
                    return wordCount;
                }
}

我如何忽略相同的单词？

Answer 1

import java.util.*;

public class MyClass {
    public static void main(String args[]) {
      String input = "A long long time ago, I can still remember";
      String[] words = input.split(" ");
      List<String> uniqueWords = new ArrayList<>();
      for (String word : words) {
        if (!uniqueWords.contains(word)) {
            uniqueWords.add(word);
        }      
      }
      System.out.println("Number of unique words: " + uniqueWords.size());
    }
}

Output：唯一单词数：8

基本上，如果您被允许使用列表等数据结构，您可以做的是创建一个列表，当且仅当它们不存在时，将输入句子的单词放入列表中。

Answer 2

大概的概念：

public int getUniqueWords(String input) {
    // Split the string into words using the split() method
    String[] words = input.split(" ");

    // Create a Set to store the unique words
    Set<String> uniqueWords = new HashSet<String>();

    // Loop through the words and add them to the Set
    for (String word : words) {
        uniqueWords.add(word);
    }

    // Return unique words amount
    return uniqueWords.size();
}

使用 StreamAPI 的相同解决方案：

public int getUniqueWords2(String input) {
    // here we can safely cast to int, because String can contain at most "max int" chars
    return (int) Arrays.stream(input.split(" ")).distinct().count();
}

如果需要处理单词之间的多个空格，请为input添加一些清理：

// remove leading and trailing spaces
cleanInput = input.trim();

// replace multiple spaces with a single space
cleanInput = cleanInput.replaceAll("\\s+", " ");

考虑到“允许使用基本知识”的要求：

哈希表（HashSet）是算法中的一种基本数据结构
如果没有包含“已经看到”项目的容器，则无法逻辑地解决计算唯一项目的问题，因此算法可以检查下一个项目是否被计算在内
在容器的角色中，在最简单的情况下可能是一个列表，但这会导致O(n^2)时间复杂度。

Answer 3

您可以使用Set<T>集合类型，它只能包含唯一值：

public static int getTotalUniqueWords(String input) {
    String[] words = input.split(" ");
    Set<String> uniqueWords = new HashSet<>();
    Collections.addAll(uniqueWords, words);
    return uniqueWords.size();
}

或使用流：

public static long getTotalUniqueWordsStream(String input) {
    String[] words = input.split(" ");
    return Arrays.stream(words).distinct().count();
}

我如何计算有多少个单词，并忽略字符串中的相同单词？ (使用方法)

问题描述

3 个解决方案

解决方案1
1 2022-12-03 21:41:18

解决方案2
1 2022-12-03 21:46:14

解决方案3
0 2022-12-04 01:36:12

我如何计算有多少个单词，并忽略字符串中的相同单词？ (使用方法)

问题描述

3 个解决方案

解决方案1 1 2022-12-03 21:41:18

解决方案2 1 2022-12-03 21:46:14

解决方案3 0 2022-12-04 01:36:12

解决方案1
1 2022-12-03 21:41:18

解决方案2
1 2022-12-03 21:46:14

解决方案3
0 2022-12-04 01:36:12