String split special regular Expression

Question

Im trying to tokenize a string input, but I cant get my head around how to do it. The Idea is, to split the string into instances of alphabetical words and non alphabetical symbols. For example the String "Test, ( abc)" would be split into ["Test" , "," , "(" , "abc" , ")" ].

Right now I use this regular Expression: "(?<=[a-zA-Z])(?=[^a-zA-Z])" but it doesnt do what I want.

Any ideas what else I could use?

Answer 1

I see that you want to group the alphabets (like Test and abc) but no grouping of the non-alphabetical characters. Also I see that you do not want to show space char. For this I will use "(\\\\w+|\\\\W)" after removing all spaces from the strings to match.

Sample code

String str = "Test, ( abc)";
str = str.replaceAll(" ",""); // in case you do not want space as separate char.
Pattern pattern = Pattern.compile("(\\w+|\\W)");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
    System.out.println(matcher.group());
}

Output

Test , ( abc ) I hope this answers your question.

Answer 2

Try this:

String s = "I want to walk my dog, and why not?";
Pattern pattern = Pattern.compile("(\\w+|\\W)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
    System.out.println(matcher.group());
}

Outputs:

I
want
to
walk
my
dog
,
and
why
not
?

\\w can be used to match word characters ([A-Za-z0-9_]), so that punctuation is removed from the results

(Taken from: here )

Answer 3

Try this:

public static ArrayList<String> res(String a) {
        String[] tokens = a.split("\\s+");
        ArrayList<String> strs = new ArrayList<>();
        for (String token : tokens) {
            String[] alpha = token.split("\\W+");
            String[] nonAlpha = token.split("\\w+");
            for (String str : alpha) {
                if (!str.isEmpty()) strs.add(str);
            }
            for (String str : nonAlpha) {
                if (!str.isEmpty()) strs.add(str);
            }
        }
        return strs;
    }

Answer 4

I guess in the simplest form, split using

"(?<=[a-zA-Z])(?=[^\\sa-zA-Z])|(?<=[^\\sa-zA-Z])(?=[a-zA-Z])|\\s+"

Explained

    (?<= [a-zA-Z] )               # Letter behind
    (?= [^\sa-zA-Z] )             # not letter/wsp ahead
 |                              # or,
    (?<= [^\sa-zA-Z] )            # Not letter/wsp behind
    (?= [a-zA-Z] )                # letter ahead
 |                              # or,
    \s+                           # whitespaces (disgarded)

String split special regular Expression

Question

4 answers

solution1
2 2017-12-26 15:44:44

solution2
0 2017-12-26 15:28:42

solution3
0 2017-12-26 15:31:53

solution4
0

String split special regular Expression

Question

4 answers

solution1 2 2017-12-26 15:44:44

solution2 0 2017-12-26 15:28:42

solution3 0 2017-12-26 15:31:53

solution4 0

solution1
2 2017-12-26 15:44:44

solution2
0 2017-12-26 15:28:42

solution3
0 2017-12-26 15:31:53

solution4
0