簡體   English   中英

Regx + Java:僅將文本單獨或結尾將文本拆分為單詞並刪除標點符號

[英]Regx + Java : split a text into words and removing punctuation only if they are alone or at the end

我試圖將字符串拆分成單詞,但我想保留“ abc”作為單詞,並且僅在單獨出現或在單詞末尾刪除標點符號,例如

"a.b.c" --> "a.b.c"
"a.b."  --> "a.b"

例如

String str1 = "abc a.b a. .  b, , test"; should return "abc","a.b","a","b","test"

您可以使用:

String str1 = "abc a.b a. .  b, , test";
String[] toks = str1.split("\\p{Punct}*\\s+[\\s\\p{Punct}]*");
for (String tok: toks)
    System.out.printf(">>> [%s]%n", tok);

>>> [abc]
>>> [a.b]
>>> [a]
>>> [b]
>>> [test]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM