I want to split a text. I can do it when I use String.split()
. For example I split "Hello world." And I get "Hello" and "world" as an output. When I do the same but with toLowerCase
I get "hello" and "world." But I don't want this dot after "world". I tried to split with different parameters and put a toLowerCase
separately from .split. And I tried to split first and then toLowerCase
. Nothing works. What should I do to fade away all these , . ! ? etc. ? Here is how I split:
predlog = main.toLowerCase().split("\\s+");
To keep only the letters and split the rest:
String[] r = main.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");
To get rid of all punctuation and split the rest:
String[] r = main.replaceAll("\\p{P}", "").toLowerCase().split("\\s+");
toLowerCase()
has no effect on dots.
If you want a simple, but slightly mysterious, fix, also split on dots:
predlog = main.toLowerCase().split("\\s+|\\.");
The reason this works is that split() discards trailing blanks from returned array.
Maybe this answer could help. The code:
String s = "Hello world.";
for (String x : s.toLowerCase().split("[\\p{P} \\t\\n\\r]+"))
System.out.println(x);
prints out:
> hello
> world
I am sorry, but the reported effect cannot be confirmed. I have tested the reported behaviour with Java 6 and Java 7 as follows:
public static void main(String[] args) {
String helloWorld = "Hello World.";
String[] splittedHelloWorld = helloWorld.split("\\s+");
String[] splittedLowerCaseHelloWorld = helloWorld.toLowerCase().split("\\s+");
boolean splittedHelloWorldContainsDot = checkContainsDot(splittedHelloWorld);
boolean splittedLowerCaseHelloWorldContainsDot = checkContainsDot(splittedLowerCaseHelloWorld);
System.out.println(splittedHelloWorldContainsDot); // true
System.out.println(splittedLowerCaseHelloWorldContainsDot); // true
}
private static boolean checkContainsDot(String[] splittedArray) {
boolean containsDot = false;
for (String string : splittedArray) {
if (string.contains(".")) {
containsDot = true;
break;
}
}
return containsDot;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.