输入字符串，将每个单词解析为所有小写字母并将每个单词打印在一行上，非字母字符被视为单词之间的分隔符

Question

I'm trying to take a string input, parse each word to all lowercase and print each word on a line (in sorted order), ignoring non-alphabetic characters (single letter words count as well). 我正在尝试输入字符串，将每个单词解析为所有小写字母，然后将每个单词打印在一行上（按排序顺序），而忽略非字母字符（也包括单个字母单词）。 So, 所以，

Sample input: 输入样例：

Adventures in Disneyland

Two blondes were going to Disneyland when they came to a fork in the
road. The sign read: "Disneyland Left."

So they went home.

Output: 输出：

a
adventures
blondes
came
disneyland
fork
going
home
in
left
read
road
sign
so
the
they
to
two
went
were
when

My program: 我的程序：

        Scanner reader = new Scanner(file);
        ArrayList<String> words = new ArrayList<String>();
        while (reader.hasNext()) {
            String word = reader.next();
            if (word != "") {
                word = word.toLowerCase();
                word = word.replaceAll("[^A-Za-z ]", "");
                if (!words.contains(word)) {
                    words.add(word);
                }
            }
        }
        Collections.sort(words);
        for (int i = 0; i < words.size(); i++) {
            System.out.println(words.get(i));
        }

This works for the input above, but prints the wrong output for an input like this: 这适用于上面的输入，但是对于这样的输入将输出错误的输出：

a  t\|his@ is$ a)( -- test's-&*%$#-`case!@|?

The expected output should be 预期输出应为

a
case
his
is
s
t
test

The output I get is 我得到的输出是

*a blank line is printed first*
a
is
testscase
this

So, my program obviously doesn't work since scanner.next() takes in characters until it hits a whitespace and considers that a string, whereas anything that is not a letter should be treated as a break between words. 因此，我的程序显然无法正常工作，因为scan.next（）会接受字符，直到碰到空白并认为该字符串是字符串，而任何非字母的字符都应视为单词之间的中断。 I'm not sure how I might be able to manipulate Scanner methods so that breaks are considered non-alphabetic characters as opposed to whitespace, so that's where I'm stuck right now. 我不确定如何才能操作Scanner方法，以便将换行符视为非字母字符而不是空格，因此这就是我现在遇到的问题。

Answer 1

The other answer has already mentioned some issues with your code. 另一个答案已经提到了您的代码中的一些问题。

I suggest another approach to address your requirements. 我建议另一种方法来满足您的要求。 Such transformations are a good use case for Java Streams – it often yields clean code: 这样的转换对于Java Streams是一个很好的用例-它经常产生干净的代码：

List<String> strs = Arrays.stream(input.split("[^A-Za-Z]+"))
    .map(t -> t.toLowerCase())
    .distinct()
    .sorted()
    .collect(Collectors.toList());

Here are the steps: 步骤如下：

Split the string by one or more subsequent characters not being alphabetic; 用一个或多个后续字符（不是字母）将字符串分开；
```
 input.split("[^A-Za-Z]+") 
```
This yields tokens consistint solely of alphabetic characters. 这会产生仅由字母字符组成的标记。
Stream over the resulting array using Arrays.stream() ; 使用Arrays.stream()在生成的数组Arrays.stream() ；
Map each element to their lowercase equivalent: 将每个元素映射到它们的小写字母等效项：
```
 .map(t -> t.toLowerCase()) 
```
The default locale is used. 使用默认语言环境。 Use toLowerCase(Locale) to explicitly set the locale. 使用toLowerCase(Locale)显式设置语言环境。
Discard duplicates using Stream.distinct() . 使用Stream.distinct()丢弃重复项。
Sort the elements within the stream by simply calling sorted() ; 只需调用sorted()即可对流中的元素进行sorted() ；
Collect the elements into a List with collect() . 使用collect()将元素收集到List 。

If you need to read it from a file, you could use this: 如果您需要从文件中读取它，则可以使用以下命令：

Files.lines(filepath)
    .flatMap(line -> Arrays.stream(line.split("[^A-Za-Z]+")))
    .map(... // Et cetera

But if you need to use a Scanner , then you could be using something like this: 但是，如果您需要使用Scanner ，则可以使用如下所示的内容：

Scanner s = new Scanner(input)
    .useDelimiter("[^A-Za-z]+");
List<String> parts = new ArrayList<>();
while (s.hasNext()) {
    parts.add(s.next());
}

And then 接着

List<String> strs = parts.stream()
    .map(... // Et cetera

Answer 2

Don't use == or != for comparing String (s). 不要使用==或!=来比较String 。 Also, perform your transform before you check for empty. 另外，在检查是否为空之前执行转换。 This, 这个，

if (word != "") {
    word = word.toLowerCase();
    word = word.replaceAll("[^A-Za-z ]", "");
    if (!words.contains(word)) {
        words.add(word);
    }
}

should look something like 应该看起来像

word = word.toLowerCase().replaceAll("[^a-z ]", "").trim();
if (!word.isEmpty() && !words.contains(word)) {
    words.add(word);
}

输入字符串，将每个单词解析为所有小写字母并将每个单词打印在一行上，非字母字符被视为单词之间的分隔符

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-02-27 00:36:09

解决方案2
0 2019-02-27 00:23:04

输入字符串，将每个单词解析为所有小写字母并将每个单词打印在一行上，非字母字符被视为单词之间的分隔符

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-02-27 00:36:09

解决方案2 0 2019-02-27 00:23:04

解决方案1
2 已采纳 2019-02-27 00:36:09

解决方案2
0 2019-02-27 00:23:04