[英]Counting Words from a String
我应该创建一种方法来计算句子中达到或超过int minLength的单词数。 例如,如果给定的最小长度为4,则您的程序应仅计算至少4个字母长的单词。
单词可以用一个或多个空格分隔。 可能存在非字母字符(空格,标点符号,数字等),但它们不会计入单词的长度。
public static int countWords(String original, int minLength) {
original = original.replaceAll("[^A-Za-z\\s]", "").replaceAll("[0-9]", "");
String[] words = original.split("\\s+");
for(String word : words){ System.out.println(word); }
int count = 0;
for (int i = 0; i < words.length; i++) {
if (words[i].length() >= minLength) {
count++;
} else if (words[i].length() < minLength || minLength == 0) {
count = 0;
}
}
System.out.println("Number of words in sentence: " + count);
return count;
}
好的,所以我更改了代码,但计数器现在减少了一个。 说我输入以下内容:西班牙是一个美丽的国家; 海滩很温暖,沙质,一尘不染。”
我收到的输出是...西班牙是一个美丽的国家,沙滩温暖,沙质,一尘不染。
单词的数量减一,应为11。看起来它没有计算句子中的最后一个单词。 我不确定问题出在哪里,原因是我只更改了replaceAll以包含转义符。
您得到的结果不正确,因为在else if条件内,count更新为0。因此,一旦出现长度<minLength的单词,计数器就会重置。 您可以删除else if条件,这应该可以修复您的代码。
此外,下面还有两个选项可以编写相同的代码,并带有必需的注释,以了解每个步骤发生的情况。
private static long countWords(final String sentence, final int minLength) {
// Validate the input sentence is not null or empty.
if (sentence == null || sentence.isEmpty()) {
return 0;
}
long count = 0;
// split the sentence by spaces to get array of words.
final String[] words = sentence.split(" ");
for (final String word : words) { // for each word
// remove unwanted characters from the word.
final String normalizedWord = word.trim().replaceAll("[^a-zA-Z0-9]", "");
// if the length of word is greater than or equal to minLength provided, increment the counter.
if (normalizedWord.length() >= minLength) {
count++;
}
}
return count;
}
private static long countWords(final String sentence, final int minLength) {
// Validate the input sentence is not null or empty.
if (sentence == null || sentence.isEmpty()) {
return 0;
}
return Stream.of(sentence.split(" "))
.filter(word -> word.trim().replaceAll("[^a-zA-Z0-9]", "").length() >= minLength)
.count();
}
输入字符串:“西班牙是一个美丽的国家;海滩是温暖,沙质且一尘不染的。”
Min Length: 3. Output: 11 Min Length: 4. Output: 8 Min Length: 5. Output: 7
对于输入字符串:“这将像魔术一样工作!”
Min Length: 4. Output: 5 Min Length: 5. Output: 2 Min Length: 6. Output: 0
对于输入字符串:“ hello $ hello”
Min Length: 4. Output: 1 Min Length: 5. Output: 1 Min Length: 6. Output: 1
1)按空间划分
2)修剪以删除多余的空格,并用“”替换所有奇怪的内容(删除)
3)计算等于或小于您的minLength的单词
public class TesterClass
{
public static void main (String args [])
{
String original = ",,, hello$hello asdasda ddd 33d 3333d a";
int minLength = 3;
String[] words = original.split(" ");
int count=0;
for( String trimAndNoStrange : words)
{
String fixed = trimAndNoStrange.trim ( ).replaceAll("[^A-Za-z]", "").replaceAll("[0-9]", "");
if(fixed.length ( ) >= minLength)
{
count++;
}
}
System.out.println("Number of words in sentence: " + count);
}
}
输入:“ 、、、、 hello $ hello asdasda ddd 33d 3333d a”
输入:minLength = 3;
输出:句子中的单词数:3
尝试将代码更新到下面
original = original.replaceAll("[^A-Za-z\\s]", "").replaceAll("[0-9]", "");
替换为空字符串而不是空格
允许存在空格(在正则表达式中添加\\ s)
您应该专注于自己想做的事情,而不是从另一侧潜入目标。 您想计算单词数 ,所以只需这样做,而不是替换或拆分 。
一个障碍可能是您对“单词”的特殊定义,但是值得花一些时间来考虑适当的模式,这比花时间去考虑多个替换模式和拆分模式要有更多的回报。
忽略长度约束,单词是任何以字母开头的字符(无论如何,数字和分隔符都不在您的最终任务中),其后是任意数量的非空格字符:
String s
="Spain is a beautiful country; the beache's are warm, sandy and spotlessly clean.";
int count=0;
for(Matcher m=Pattern.compile("[A-Za-z][^\\s]*").matcher(s); m.find();) {
System.out.println(count+": "+m.group());
count++;
}
System.out.println("total number of words: "+count);
将打印:
0: Spain
1: is
2: a
3: beautiful
4: country;
5: the
6: beache's
7: are
8: warm,
9: sandy
10: and
11: spotlessly
12: clean.
total number of words: 13
结合最小长度而不计算非字母字符可能会有些棘手,但是可以通过考虑每个字母后面都可以跟任意数量的可忽略(即非字母非空格)字符来解决,我们只计算了该组合的出现。 所以
String s
="Spain is a beautiful country; the beache's are warm, sandy and spotlessly clean.";
int count=0;
for(Matcher m=Pattern.compile("([A-Za-z][^A-Za-z\\s]*+){4,}").matcher(s); m.find();) {
System.out.println(count+": "+m.group());
count++;
}
System.out.println("total number of words >=4 letters: "+count);
版画
0: Spain
1: beautiful
2: country;
3: beache's
4: warm,
5: sandy
6: spotlessly
7: clean.
total number of words >=4 letters: 8
如果您想知道, *+
量词类似于*
但是告诉正则表达式引擎不要在匹配的那部分进行回溯 ,这是这种情况下的一种优化。 简而言之,如果可忽略字符后面没有字母,那么在可忽略字符中也不会有字母,因此引擎不应该花时间在那儿找到一个。
将其放入方法形式:
public static int countWords(String original, int minLength) {
if(minLength<1) throw new IllegalArgumentException();
int count=0;
for(Matcher m=Pattern.compile("([A-Za-z][^A-Za-z\\s]*+){"+minLength+",}")
.matcher(original); m.find();) {
count++;
}
return count;
}
并像这样使用
String s
="Spain is a beautiful country; the beache's are warm, sandy and spotlessly clean.";
for(int i=1; i<10; i++)
System.out.println("with at least "+i+" letters: "+countWords(s, i));
产量
with at least 1 letters: 13
with at least 2 letters: 12
with at least 3 letters: 11
with at least 4 letters: 8
with at least 5 letters: 7
with at least 6 letters: 4
with at least 7 letters: 4
with at least 8 letters: 2
with at least 9 letters: 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.