简体   繁体   中英

Java Regex: Split based on non-word characters except for apostrophe

I'm trying to split and include based on spaces and non-word characters, except for apostrophes.

I've been able to make it split and include based on spaces and non-word characters, but I can't seem to figure out how to exclude apostrophes from the non-word characters.

This is my current Regex...

str.split("\\s|(?=\\W)");

...which when run on this code sample:

program p;
begin
    write('x');
end.

...produces this result:

program
p
;
begin

write
(
'x   <!-- This is the problem.
'
)
;
end
.

Which is almost correct, but my goal is to skip the apostrophes so that this is the result:

program
p
;
begin

write
(
'x'   <!-- This is the wanted result.
)
;
end
.

UPDATE

As suggested I've tried:

str.split("\\s|(?=\\W)(?<=\\W)");

Which almost works, but does not split all of the special characters correctly:

program
p;
begin
write(
'x'
)
;
end.

Have you tried...

[^\w']

This will match any character that is neither a word character nor an apostrophe. May be simple enough to work depending on your inputs.

If you run a replace operation using [^\\w'] as your regex and \\n\\1\\n as your replacement string, it should get you close to where you'd like to be.

You can split on this.

\s|('[^']*')|(?=\W)

See demo.

https://regex101.com/r/mL7eL6/1

Treat the apostrophe separately and requiring a preceding non-word:

str.split("\\s+|(?=[^\\w'])|(?<=\\W)(?=')");

See live demo .

作为替代方案,可以扫描\\ b [\\ w'] + \\ b的字符串

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM