简体   繁体   English

输入字符串,将每个单词解析为所有小写字母并将每个单词打印在一行上,非字母字符被视为单词之间的分隔符

[英]Take string input, parse each word to all lowercase and print each word on a line, non-alphabetic characters are treated as a break between words

I'm trying to take a string input, parse each word to all lowercase and print each word on a line (in sorted order), ignoring non-alphabetic characters (single letter words count as well). 我正在尝试输入字符串,将每个单词解析为所有小写字母,然后将每个单词打印在一行上(按排序顺序),而忽略非字母字符(也包括单个字母单词)。 So, 所以,

Sample input: 输入样例:

Adventures in Disneyland

Two blondes were going to Disneyland when they came to a fork in the
road. The sign read: "Disneyland Left."

So they went home.

Output: 输出:

a
adventures
blondes
came
disneyland
fork
going
home
in
left
read
road
sign
so
the
they
to
two
went
were
when

My program: 我的程序:

        Scanner reader = new Scanner(file);
        ArrayList<String> words = new ArrayList<String>();
        while (reader.hasNext()) {
            String word = reader.next();
            if (word != "") {
                word = word.toLowerCase();
                word = word.replaceAll("[^A-Za-z ]", "");
                if (!words.contains(word)) {
                    words.add(word);
                }
            }
        }
        Collections.sort(words);
        for (int i = 0; i < words.size(); i++) {
            System.out.println(words.get(i));
        }

This works for the input above, but prints the wrong output for an input like this: 这适用于上面的输入,但是对于这样的输入将输出错误的输出:

a  t\|his@ is$ a)( -- test's-&*%$#-`case!@|?

The expected output should be 预期输出应为

a
case
his
is
s
t
test

The output I get is 我得到的输出是

*a blank line is printed first*
a
is
testscase
this

So, my program obviously doesn't work since scanner.next() takes in characters until it hits a whitespace and considers that a string, whereas anything that is not a letter should be treated as a break between words. 因此,我的程序显然无法正常工作,因为scan.next()会接受字符,直到碰到空白并认为该字符串是字符串,而任何非字母的字符都应视为单词之间的中断。 I'm not sure how I might be able to manipulate Scanner methods so that breaks are considered non-alphabetic characters as opposed to whitespace, so that's where I'm stuck right now. 我不确定如何才能操作Scanner方法,以便将换行符视为非字母字符而不是空格,因此这就是我现在遇到的问题。

The other answer has already mentioned some issues with your code. 另一个答案已经提到了您的代码中的一些问题。

I suggest another approach to address your requirements. 我建议另一种方法来满足您的要求。 Such transformations are a good use case for Java Streams – it often yields clean code: 这样的转换对于Java Streams是一个很好的用例-它经常产生干净的代码:

List<String> strs = Arrays.stream(input.split("[^A-Za-Z]+"))
    .map(t -> t.toLowerCase())
    .distinct()
    .sorted()
    .collect(Collectors.toList());

Here are the steps: 步骤如下:

  1. Split the string by one or more subsequent characters not being alphabetic; 用一个或多个后续字符(不是字母)将字符串分开;

     input.split("[^A-Za-Z]+") 

    This yields tokens consistint solely of alphabetic characters. 这会产生仅由字母字符组成的标记

  2. Stream over the resulting array using Arrays.stream() ; 使用Arrays.stream()在生成的数组Arrays.stream()

  3. Map each element to their lowercase equivalent: 将每个元素映射到它们的小写字母等效项:

     .map(t -> t.toLowerCase()) 

    The default locale is used. 使用默认语言环境。 Use toLowerCase(Locale) to explicitly set the locale. 使用toLowerCase(Locale)显式设置语言环境。

  4. Discard duplicates using Stream.distinct() . 使用Stream.distinct()丢弃重复项。

  5. Sort the elements within the stream by simply calling sorted() ; 只需调用sorted()即可对流中的元素进行sorted()

  6. Collect the elements into a List with collect() . 使用collect()将元素收集到List


If you need to read it from a file, you could use this: 如果您需要从文件中读取它,则可以使用以下命令:

Files.lines(filepath)
    .flatMap(line -> Arrays.stream(line.split("[^A-Za-Z]+")))
    .map(... // Et cetera

But if you need to use a Scanner , then you could be using something like this: 但是,如果您需要使用Scanner ,则可以使用如下所示的内容:

Scanner s = new Scanner(input)
    .useDelimiter("[^A-Za-z]+");
List<String> parts = new ArrayList<>();
while (s.hasNext()) {
    parts.add(s.next());
}

And then 接着

List<String> strs = parts.stream()
    .map(... // Et cetera

Don't use == or != for comparing String (s). 不要使用==!=来比较String Also, perform your transform before you check for empty. 另外, 检查是否为空之前执行转换。 This, 这个,

if (word != "") {
    word = word.toLowerCase();
    word = word.replaceAll("[^A-Za-z ]", "");
    if (!words.contains(word)) {
        words.add(word);
    }
}

should look something like 应该看起来像

word = word.toLowerCase().replaceAll("[^a-z ]", "").trim();
if (!word.isEmpty() && !words.contains(word)) {
    words.add(word);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从字符串中排除具有非字母字符的单词 - How to exclude the words that have non-alphabetic characters from string 非字母字符上的 Java 字符串拆分 - Java String Split On Non-Alphabetic Characters Java将字符串输入作为数组输入,每个字符串中的每个单词都换行 - Java Print string input as an array with each word in string on a new line 如何在单词之间保持空格的同时删除非字母字符? - How to delete non-alphabetic characters while maintaining spaces between words? 如何在 Java 中使用正则表达式从字符串中删除所有非字母字符 - How can I remove all Non-Alphabetic characters from a String using Regex in Java 如何替换所有非字母字符并保留空格? - How to replace all non-alphabetic characters and keep the spaces? 使用isLetter()将字符串拆分为单词,然后在新行中打印出每个单词 - Splitting String into words using isLetter() and then print out each word at a new line java-如何替换两个单词之间的字符串,其中每个单词在不同行中都由java - How to replace a string between two words where each word in a different line by java 在字符串数组中打印每个单词的bi和trigram - print bi and trigrams of each word in a string array JAVA:计算字符串中的每个单词,并计算单词中的每个字母 - JAVA: Count each word on a String, and count each letter on the words
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM