用于拆分字符串的正则表达式

Question

I'm trying to split a string by using regex, so far I have 我正在尝试使用正则表达式拆分字符串，到目前为止

String[] words = a.replaceAll("[^a-zA-Z ]","").toLowerCase().split("\\s+");

And it's almost what I want, but I need to split the text also when there is a newline character in the string (by the way - should I actually use newline or return? What is the actual difference?) 这几乎是我想要的，但是当字符串中有换行符时，我也需要拆分文本（顺便说一句-我应该实际使用换行还是return？实际区别是什么？）

To clarify, my input is: 为了澄清，我的输入是：

this is a,
sample of
a file.

After splitting and doing a routine that sorts the words and counts occurrences of each, I should be getting this: 拆分并执行了对单词进行排序并计算每个单词的出现次数的例程后，我应该得到以下信息：

a: 2
file: 1
is: 1
of: 1
sample: 1
this: 1

Instead, I get: 相反，我得到：

asample: 1
file: 1
is: 1
ofa: 1
this: 1

How should I correct my regular expression to split at newlines as well? 我该如何纠正我的正则表达式也要在换行符处分割？

Answer 1

Use \\b[A-Za-z]+\\b regexp to find the word matches. 使用\\b[A-Za-z]+\\b表达式查找单词匹配项。 http://regexr.com/3ae1c http://regexr.com/3ae1c

Answer 2

You must change your replaceAll like this: 您必须像这样更改replaceAll：

 a.replaceAll("[^a-zA-Z]+"," ")

or as suggested by Alexander why not find directly the words (that is more straight to the point) 或亚历山大（Alexander）所建议的，为什么不直接找到这些词（更直接一点）

Answer 3

Just insert a space in your second argument of the replaceAll method and that should work 只需在replaceAll方法的第二个参数中插入一个空格即可，

replaceAll("[^a-zA-Z ]"," ")

Or you can make it more efficient and avoid unnecessary spaces in the string returned by the replaceAll method by using the '+' quantifier as suggested by Casimir 或者，您可以按照Casimir的建议使用'+'量词来提高效率，并避免replaceAll方法返回的字符串中出现不必要的空格。

Both would work just fine in your case 两种都适合您的情况

用于拆分字符串的正则表达式

问题描述

3 个解决方案

解决方案1
2 2015-02-15 08:38:37

解决方案2
1 2015-02-15 08:39:00

解决方案3
0 2015-02-15 08:56:00

用于拆分字符串的正则表达式

问题描述

3 个解决方案

解决方案1 2 2015-02-15 08:38:37

解决方案2 1 2015-02-15 08:39:00

解决方案3 0 2015-02-15 08:56:00

解决方案1
2 2015-02-15 08:38:37

解决方案2
1 2015-02-15 08:39:00

解决方案3
0 2015-02-15 08:56:00