简体   繁体   English

如何找到JAVA字符串中遇到的定界符

[英]How to find the delimiter encountered in a string in JAVA

I have written simple program in Java which does manipulation of a given string. 我已经用Java编写了简单的程序,该程序可以处理给定的字符串。

The input string has some delimiters which are non-alphabets. 输入字符串具有一些非字母的定界符。 I have used String Tokenizer to read and manipulate the individual words in a string. 我已经使用String Tokenizer来读取和操作字符串中的单个单词。

Now I need to reconstruct this manipulated string with the same set of delimiters. 现在,我需要使用相同的定界符集来重构此操作的字符串。 Appreciate if any one can suggest me how to identify the delimiter. 如果有人能建议我如何识别定界符,我将不胜感激。

In other words, this is what input is: 换句话说,这就是输入:

Text1 Delimiter1 Text2 Delimiter2 Text3 Delimiter3 Text4 Delimiter4 文本1分隔符1文本2分隔符2文本3分隔符3文本4分隔符4

This is what my code does: 这是我的代码的作用:

NewText1 NewText2 NewText3 NewText4 NewText1 NewText2 NewText3 NewText4

I made use of string tokenizer to identify the next token in this manner: 我以这种方式利用字符串标记器来标识下一个标记:

StringTokenizer st = new StringTokenizer(str, ", 0123456789(*&^%$#@!-_)");

But now I would like to identify the delimiter that was encountered so that I can build my new string. 但是现在我想确定遇到的定界符,以便可以构建新的字符串。

This is what I actually want: 这是我真正想要的:

NewText1 Delimiter1 NewText2 Delimiter2 NewText3 Delimiter3 NewText4 Delmiter4 NewText1定界符1 NewText2定界符2 NewText3定界符3 NewText4 Delmiter4

You can proceed according to this: 您可以按照以下步骤进行:

String dels = "-, 0123456789(*&^%$#@!_)";
String specs = "[" + dels + "]+";
String letts = "[^" + dels + "]+";
String text = "one, two - three! four";
String[] words = text.split( specs );
String[] delim = text.split( letts );

Note that in dels the hyphen must be up front. 请注意,在连字符中,连字符必须位于前面。 If you ever add [ or ] or ^ more care must be taken - check the javadoc in java.util.regex.Pattern. 如果添加[]^必须格外小心-检查java.util.regex.Pattern中的javadoc。

There is no particular problem with composing the original string. 组成原始字符串没有特别的问题。

The disadvantage with StringTokenizer with a third argument is that it returns each delimiter as a separate token of length 1. 具有第三个参数的StringTokenizer的缺点在于,它将每个定界符作为长度为1的单独标记返回。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM