简体   繁体   English

Java中用于字符串的字符串分词器

[英]String Tokenizer in java for alphabets

I have a peculiar regex kind of requirement. 我有一个特殊的正则表达式要求。

I have a string like 我有一个像

Get Carter,Tigerland,Super, The,Wolf, The.

Here "Super, The" and "Wolf, The" are single word. 这里的“ Super,The”和“ Wolf,The”是一个单词。

I need to tokenize it as follow 我需要如下标记

"Get Carter"
"Tigerland"
"Super, The"
"Wolf, The"

The only thing I have to note is commas in single word are followed by space, while commas between two different words do not have a space following. 我唯一要注意的是单个单词中的逗号后跟空格,而两个不同单词之间的逗号后没有空格。

Is there anything in string tokenizer like checking for ",W", where W is any alphabet? 字符串标记器中是否有任何内容,例如检查“,W”,其中W是任何字母?

您想要否定的前瞻, ",(?! )"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM