[英]Java .split doesn't work well with toLowerCase
I want to split a text. 我想分割文本。 I can do it when I use
String.split()
. 当我使用
String.split()
时,我可以做到。 For example I split "Hello world." 例如,我拆分了“ Hello world”。 And I get "Hello" and "world" as an output.
我得到“ Hello”和“ world”作为输出。 When I do the same but with
toLowerCase
I get "hello" and "world." 当我执行相同的操作但使用
toLowerCase
“ hello”和“ world”。 But I don't want this dot after "world". 但是我不希望在“世界”之后出现这个点。 I tried to split with different parameters and put a
toLowerCase
separately from .split. 我尝试使用不同的参数进行拆分,并将
toLowerCase
与.split分开放置。 And I tried to split first and then toLowerCase
. 而且我尝试先拆分,然后拆分为
toLowerCase
。 Nothing works. 没用。 What should I do to fade away all these , .
我该怎么做才能淡化所有这些,。 !
! ?
? etc. ?
等 Here is how I split:
这是我的分割方式:
predlog = main.toLowerCase().split("\\s+");
To keep only the letters and split the rest: 要仅保留字母并拆分其余字母:
String[] r = main.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");
To get rid of all punctuation and split the rest: 要摆脱所有标点符号并拆分其余部分:
String[] r = main.replaceAll("\\p{P}", "").toLowerCase().split("\\s+");
toLowerCase()
has no effect on dots. toLowerCase()
对点没有影响。
If you want a simple, but slightly mysterious, fix, also split on dots: 如果您想要一个简单但又有点神秘的解决方法,也可以将其拆分为多个点:
predlog = main.toLowerCase().split("\\s+|\\.");
The reason this works is that split() discards trailing blanks from returned array. 起作用的原因是split()丢弃返回数组中的尾随空白。
Maybe this answer could help. 也许这个答案可能会有所帮助。 The code:
编码:
String s = "Hello world.";
for (String x : s.toLowerCase().split("[\\p{P} \\t\\n\\r]+"))
System.out.println(x);
prints out: 打印出:
> hello
> world
I am sorry, but the reported effect cannot be confirmed. 抱歉,报告的效果无法确认。 I have tested the reported behaviour with Java 6 and Java 7 as follows:
我已经用Java 6和Java 7测试了报告的行为,如下所示:
public static void main(String[] args) {
String helloWorld = "Hello World.";
String[] splittedHelloWorld = helloWorld.split("\\s+");
String[] splittedLowerCaseHelloWorld = helloWorld.toLowerCase().split("\\s+");
boolean splittedHelloWorldContainsDot = checkContainsDot(splittedHelloWorld);
boolean splittedLowerCaseHelloWorldContainsDot = checkContainsDot(splittedLowerCaseHelloWorld);
System.out.println(splittedHelloWorldContainsDot); // true
System.out.println(splittedLowerCaseHelloWorldContainsDot); // true
}
private static boolean checkContainsDot(String[] splittedArray) {
boolean containsDot = false;
for (String string : splittedArray) {
if (string.contains(".")) {
containsDot = true;
break;
}
}
return containsDot;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.