简体   繁体   中英

How to split a string with any whitespace chars as delimiters

我需要什么正则表达式模式传递给java.lang.String.split()以使用所有空白字符( ' ''\\t''\\n'等)作为分隔符将字符串拆分为子字符串数组?

Something in the lines of

myString.split("\\s+");

This groups all white spaces as a delimiter.

So if I have the string:

"Hello[space character][tab character]World"

This should yield the strings "Hello" and "World" and omit the empty space between the [space] and the [tab] .

As VonC pointed out, the backslash should be escaped, because Java would first try to escape the string to a special character, and send that to be parsed. What you want, is the literal "\\s" , which means, you need to pass "\\\\s" . It can get a bit confusing.

The \\\\s is equivalent to [ \\\\t\\\\n\\\\x0B\\\\f\\\\r] .

In most regex dialects there are a set of convenient character summaries you can use for this kind of thing - these are good ones to remember:

\\w - Matches any word character.

\\W - Matches any nonword character.

\\s - Matches any white-space character.

\\S - Matches anything but white-space characters.

\\d - Matches any digit.

\\D - Matches anything except digits.

A search for "Regex Cheatsheets" should reward you with a whole lot of useful summaries.

为了让它在 Javascript 中工作,我必须执行以下操作:

myString.split(/\s+/g)

"\\\\s+" 应该可以解决问题

你也可能有一个 UniCode 不间断空格 xA0 ...

String[] elements = s.split("[\\s\\xA0]+"); //include uniCode non-breaking
String string = "Ram is going to school";
String[] arrayOfString = string.split("\\s+");

Apache Commons Lang has a method to split a string with whitespace characters as delimiters:

StringUtils.split("abc def")

http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split(java.lang.String)

This might be easier to use than a regex pattern.

I'm surprised that nobody has mentioned String.split() with no parameters. Isn't that what it's made for? as in:

"abc def ghi".split()

All you need is to split using the one of the special character of Java Ragex Engine,

and that is- WhiteSpace Character

  • \\d Represents a digit: [0-9]
  • \\D Represents a non-digit: [^0-9]
  • \\s Represents a whitespace character including [ \\t\\n\\x0B\\f\\r]
  • \\S Represents a non-whitespace character as [^\\s]
  • \\v Represents a vertical whitespace character as [\\n\\x0B\\f\\r\\x85\
\
]
  • \\V Represents a non-vertical whitespace character as [^\\v]
  • \\w Represents a word character as [a-zA-Z_0-9]
  • \\W Represents a non-word character as [^\\w]

Here, the key point to remember is that the small leter character \\s represents all types of white spaces including a single space [ ] , tab characters [ ] or anything similar.

So, if you'll try will something like this-

String theString = "Java<a space><a tab>Programming"
String []allParts = theString.split("\\s+");

You will get the desired output.


Some Very Useful Links:


Hope, this might help you the best!!!

因为它是一个正则表达式,我假设你也不希望非字母数字字符,如逗号、点等可能被空格包围(例如,“一,二”应该给出 [one][two]),它应该是:

myString.split(/[\s\W]+/)

you can split a string by line break by using the following statement :

 String textStr[] = yourString.split("\\r?\\n");

you can split a string by Whitespace by using the following statement :

String textStr[] = yourString.split("\\s+");
String str = "Hello   World";
String res[] = str.split("\\s+");

To split a string with any Unicode whitespace , you need to use

s.split("(?U)\\s+")
         ^^^^

The (?U) inline embedded flag option is the equivalent of Pattern.UNICODE_CHARACTER_CLASS that enables \\s shorthand character class to match any characters from the whitespace Unicode category.

If you want to split with whitespace and keep the whitespaces in the resulting array , use

s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")

See the regex demo . See Java demo :

String s = "Hello\t World\u00A0»";
System.out.println(Arrays.toString(s.split("(?U)\\s+"))); // => [Hello, World, »]
System.out.println(Arrays.toString(s.split("(?U)(?<=\\s)(?=\\S)|(?<=\\S)(?=\\s)")));
// => [Hello,    , World,  , »]

when you just want to split by a whitespace and NOT by an tab eg, you can use:

String[] words = textline.split(" ");

example

textline: "igno\\tre the tab in the first word"

words: [igno\\tre, the, tab, in, the, first, word]

Study this code.. good luck

    import java.util.*;
class Demo{
    public static void main(String args[]){
        Scanner input = new Scanner(System.in);
        System.out.print("Input String : ");
        String s1 = input.nextLine();   
        String[] tokens = s1.split("[\\s\\xA0]+");      
        System.out.println(tokens.length);      
        for(String s : tokens){
            System.out.println(s);

        } 
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM