简体   繁体   English

如何使用 java 计算给定字符串的 substring 中的总元音?

[英]How to count the total vowels in substring of a given string using java?

You are given a string, One has to constitute all the substring of a given string,给你一个字符串,一个必须构成给定字符串的所有 substring,

Example, String "baceb" is given, the substrings are {b, ba, bac, bace, a, ac, ace, aceb, c, ce, ceb, e, eb, baceb} and each element in the list contains, {0, 1, 1, 2, 1, 1, 2, 2, 0, 1, 1, 1, 1, 2} number of vowels, the sum is 16. The size of the string goes up to 10^5.例如,给出字符串“baceb”,子字符串是 {b, ba, bac, bace, a, ac, ace, aceb, c, ce, ceb, e, eb, baceb} 并且列表中的每个元素都包含,{ 0, 1, 1, 2, 1, 1, 2, 2, 0, 1, 1, 1, 1, 2} 个元音,总和为 16。字符串的大小可达 10^5。

This is how far I have come, works for smaller case files, for bigger cases,I get timeout error.这就是我已经走了多远,适用于较小的案例文件,对于较大的案例,我得到超时错误。

    Scanner sc=new Scanner(System.in);
    int n=sc.nextInt();
    sc.nextLine();

    while(n-->0){
        ArrayList<String> list=new ArrayList<>();
        String s=sc.nextLine();
        int len=s.length();
        s=s.toLowerCase();
        for(int i=0;i<len;i++){
            for(int j=i+1;j<=len;j++){
                String temp=s.substring(i,j);
                if(!list.contains(temp)){
                    list.add(temp);
                }
            }
        }

    // System.out.println(list);
    int count=0;
    for(String str:list){
        for(int k=0;k<str.length();k++){
            char ch=str.charAt(k);
            if(ch=='a'||ch=='e'||ch=='i'||ch=='o'||ch=='u'){
                count++;
            }
        }
    }
    System.out.println(count);
    }

n is the number of test cases. n 是测试用例的数量。

Any help is appreciated.任何帮助表示赞赏。 Thanks.谢谢。

You've done everything correctly this far.到目前为止,您已经正确地完成了所有操作。 You need to do some more.你需要做更多。

First approach to a programming task is to write clear code that solves it.编程任务的第一种方法是编写清晰的代码来解决它。 In this case write code that generates the substrings and counts the vowels in the generated substrings.在这种情况下,编写代码来生成子字符串并计算生成的子字符串中的元音。 Every programmer can understand this.每个程序员都可以理解这一点。 Very good work.非常好的工作。

Next, if the coded solution turns out not to perform well enough, it's time for optimization (for 99.9 % of real-world programming we will never reach this point, but in coding challenges like yours it's commonplace).接下来,如果编码解决方案的性能不够好,就该进行优化了(对于 99.9% 的实际编程,我们永远不会达到这一点,但在像您这样的编码挑战中,这是司空见惯的)。 You will want code that performs well enough that no timeout error occurs.您将需要性能足够好的代码,不会发生超时错误。

For an optimized solution we don't need to generate the substrings.对于优化的解决方案,我们不需要生成子字符串。 Instead we observe: For your example string 5 characters long: The first character (index 0) can be part of 5 substrings: b , ba , bac , bace and baceb .相反,我们观察到:对于您的示例字符串,长度为 5 个字符:第一个字符(索引 0)可以是 5 个子字符串的一部分: bbabacbacebaceb However, it's a consonant, so it doesn't really matter how many.然而,它是一个辅音,所以多少并不重要。 The next character, a at index 1, is part of 8 substrings: 4 beginning at index 0 and 4 beginning at index 1. So it contributes 8 towards the total of 16 vowels in all substrings.下一个字符,索引 1 处a ,是 8 个子串的一部分:4 个从索引 0 开始,4 个从索引 1 开始。因此,它在所有子串的 16 个元音中贡献了 8 个。 Had the next character ( c at index 2) been a vowel, we would have needed to calculate that it goes into 9 substrings: 3 beginning at index 0, 3 beginning at index 1 and 3 beginning at index 2. Can you begin to see a pattern?如果下一个字符(索引 2 处的c )是元音,我们需要计算它进入 9 个子字符串:3 从索引 0 开始,3 从索引 1 开始,3 从索引 2 开始。你能开始看到一种模式? I think that we can calculate the number of substrings that a character contributes to by multiplying the number of characters up to and including that character by the number of characters from that character inclusive to the end of the string.我认为我们可以通过将直到并包括该字符的字符数乘以从该字符到字符串末尾的字符数来计算字符贡献的子字符串数。 Please check if I am correct.请检查我是否正确。

So an efficient algorithm can be: iterate through the string indices.所以一个有效的算法可以是:遍历字符串索引。 If the character at a given index is a vowel, calculate how many substrings it is in, and add this count to a total.如果给定索引处的字符是元音,计算它有多少子串,并将这个计数加到总数中。

Edit:编辑:

But without constructing the substring how is one going to know how many vowels actually are in the substring?但是如果不构建 substring,如何知道 substring 中实际有多少元音?

I am unsure how I can explain that better than what I have already tried.我不确定如何比我已经尝试过的更好地解释这一点。

The point is: You do not need to know how many vowels are in each substring.关键是:您不需要知道每个 substring 中有多少个元音。 You only need to know the sum of all those counts.您只需要知道所有这些计数的总和 So we are obtaining that sum in quite a different manner.因此,我们以完全不同的方式获得该总和。 We are exploiting the fact that every time there's a vowel in a substring, that vowel must come from one particular index in the original string.我们正在利用这样一个事实,即每次 substring 中有元音时,该元音必须来自原始字符串中的一个特定索引。 So instead of counting the vowels in each substring we are counting the substrings that each vowel is in. The result has got to be the same.因此,我们不是计算每个 substring 中的元音,而是计算每个元音所在的子串。结果必须相同。

Take the example string from your question, baceb .从您的问题baceb中获取示例字符串。 There are two vowels, a at index 1 and e at index 3. a is in substrings ba , bac , bace , baceb , a , ac , ace and aceb , 8 in total.有两个元音, a在索引 1 和e在索引 3。a a子串babacbacebacebaacaceaceb中,总共 8 个。 So contributes 8 to the count of vowels in all substrings.所以对所有子串中的元音计数贡献了 8。 e too is in 8 substrings. e也在 8 个子串中。 8 + 8 equals 16, which is the sum of counts of vowels in all substings. 8 + 8 等于 16,这是所有子串中元音计数的总和。

Let me try a more formal argument.让我尝试一个更正式的论点。 Consider a vowel at index i in a string of length len ( 0 <= i < len ).考虑长度为len ( 0 <= i < len ) 的字符串中索引i处的元音。 Now the question is: out of the substrings of the string, in how many is this particular vowel included?现在的问题是:在字符串的子字符串中,这个特定的元音包含多少? For it to be included in a substring, that substring must being at index 0, 1, … i (inclusive), so there are i + 1 possible start indices.要将其包含在 substring 中,substring 必须位于索引 0、1、... i (含)处,因此有i + 1可能的起始索引。 The substring must end at index i + 1 , i + 2 , … len , giving len - i possibilities. substring 必须以索引i + 1i + 2 ,... len结束,从而提供len - i可能性。 Since every possible start index can be combined with any possible end index to define a substring, we can multiply the two numbers.由于每个可能的开始索引都可以与任何可能的结束索引组合来定义 substring,我们可以将这两个数字相乘。 The product gives the number of substrings that this vowel is in. And hence the contribution from this vowel towards the sum of counts of vowels in all substrings.该乘积给出了这个元音所在的子串的数量。因此,这个元音对所有子串中元音计数总和的贡献。 So what's left to do is add up all the products for the vowels in the original string.所以剩下要做的就是将原始字符串中元音的所有乘积相加。 Then you've got your result.然后你就得到了你的结果。

Happy coding.快乐编码。

PS I have assumed that substrings need not be unique. PS我假设子字符串不必是唯一的。 In the string bobo the substring bo comes twice and o constributes to the vowel count both times.在字符串bobo中,substring bo出现两次, o两次都有助于元音计数。 I see from your code that this disagrees with your understanding, but I would still assume that mine is correct.我从您的代码中看到这与您的理解不一致,但我仍然认为我的代码是正确的。

PPS Also be aware that for strings of length up to 100 000 the total may overflow an int . PPS 另请注意,对于长度不超过 100 000 的字符串,总数可能会溢出int Use a long for the total count.使用long作为总计数。

PPPS For an addition slight optimization you may make a faster check on whether a character is a vowel. PPPS 对于额外的轻微优化,您可以更快地检查字符是否为元音。 Create a BitSet once, and set the 10 bits corresponding to the upper case and lower case variants of each vowel.一次创建一个BitSet ,并设置每个元音的大小写变体对应的 10 位。 Now to check whether a character is a vowel simply inquire whether the corresponding bit in the BitSet is set.现在要检查一个字符是否为元音,只需查询BitSet中的相应位是否已设置。 No need to convert to lower case first.无需先转换为小写。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM