简体   繁体   English

StringTokenizer显示不需要的结果

[英]StringTokenizer showing unwanted results

When I ran the following code I found something strange. 当我运行以下代码时,我发现了一些奇怪的东西。

The output of below program is token1=AAAAA token2=BBBBB| 下面程序的输出是token1 = AAAAA token2 = BBBBB |

However, From my understanding, it should be token1=AAAAA token2=BBBBB|DUMMY 但是,据我了解,应该是token1 = AAAAA token2 = BBBBB | DUMMY

public class TestToken {

    public static void main(final String[] args) {
        final String delim = "DELIM";
        String token1 = "AAAAA";
        String token2 = "BBBBB|DUMMY";
        final String input = token1 + delim + token2;
        final StringTokenizer tokenizer = new StringTokenizer(input, delim);
        final String text1 = tokenizer.nextToken();
        final String text2 = tokenizer.nextToken();
        System.out.println("token1=" + text1);
        System.out.println("token2=" + text2);
        System.out.println();
    }

}

Can some one explain me how to fix this problem and why it is behaving like this ? 有人可以向我解释如何解决此问题,以及为什么会这样吗?

Excerpt from the constructor's documentation : 摘录自构造函数的文档

The characters in the delim argument are the delimiters for separating tokens. delim参数中的字符是用于分隔标记的定界符。

That means that each character is a delimiter, not the whole string. 这意味着每个字符都是一个定界符,而不是整个字符串。 In fact, you have 5 delimiters (the characters D , E , L , I , and M ). 实际上,您有5个定界符(字符DELIM )。

You can see the effect with the following code 您可以使用以下代码查看效果

while (tokenizer.hasMoreTokens())
   System.out.println(tokenizer.nextToken());

which prints out: 输出:

AAAAA
BBBBB|
U
Y

No your delimiters are DELI and M 没有分隔符是DELI和M

See the javadocs All characters in the delim argument are the delimiters for separating tokens. 请参见javadocs。delim 参数中的所有字符都是用于分隔标记的定界符。

delim - the delimiters. delim-分隔符。

consider 考虑

    final String delim = "DELIM";
    String token1 = "AAAAA";
    String token2 = "BBBBB|ZUMMY";
    final String input = token1 + delim + token2;
    final StringTokenizer tokenizer = new StringTokenizer(input, delim);
    final String text1 = tokenizer.nextToken();
    final String text2 = tokenizer.nextToken();
    System.out.println("token1=" + text1);
    System.out.println("token2=" + text2);
    System.out.println();

StringTokenizer takes a string where each character is a delimiter . StringTokenizer接受一个字符串,其中每个字符都是一个定界符 Since D is one of your delimiters, the second token is cut off after the | 由于D是您的定界符之一,因此第二个标记在|之后被截断| .

If you want to use multi-character delimiters, you'll have to use a different technique. 如果要使用多字符定界符,则必须使用其他技术。 Eg, split : 例如, split

String[] parts = Arrays.toString(input.split(delim)); 

There are multiple option to implement this functionality. 有多个选项可以实现此功能。 To start with why it is happening, I think that is well explained by various posts that it is because your delimeter is not "DELIM", instead it is "D","E","L","I","M" 首先,为什么会发生这种情况,我认为各种帖子都很好地解释了这是因为您的定界符不是“ DELIM”,而是“ D”,“ E”,“ L”,“ I”,“ M” “

Now what you can use if you want to seperate string based on another string like DELIM 现在,如果要基于另一个字符串(如DELIM)分隔字符串,可以使用什么

Option 1: Use String split method which will take the delimeter string as argument and will return the array of tokens 选项1:使用字符串拆分方法,该方法将以分隔符字符串作为参数并返回令牌数组

String statement = "AAAADELIMBBBB|DUMMY";
String tokens[] = statement.split("DELIM");

Option 2: Using splitAsStream which will take statement as argument and compile will take regex delimiter as argument 选项2:使用splitAsStream(它将语句作为参数并进行编译)将正则表达式定界符作为参数

Pattern.compile("DELIM").splitAsStream("AAAADELIMBBBB|DUMMY").forEach(System.out::println);

Option 3 : Using Stream.of with split as arguement 选项3 :将Stream.of与split作为争论

Stream.of("AAAADELIMBBBB|DUMMY".split("DELIM")).forEach(System.out::println);

Apart from above super cool ways to split, if you are a die hard fan of String Tokenizer and want to implement it using that only, you can also use String Tokenizer with "D" as delimeter and then for each token received, can check for first four character to be "ELIM". 除了上述超级酷的拆分方法外,如果您是String Tokenizer的忠实粉丝,并且只想使用该方法来实现它,则还可以使用String Tokenizer并以“ D”作为分隔符,然后针对收到的每个令牌进行检查前四个字符为“ ELIM”。 If yes, take the remaninng substring as token and concatenate with further receiving tokens and if not append D in start and then append with the current token. 如果是,则将remaninng子字符串作为令牌,并与其他接收令牌连接;如果不是,则在开始处附加D,然后在当前令牌后附加。

From the doc of StringTokenizer 来自StringTokenizer的文档

Constructs a string tokenizer for the specified string. 为指定的字符串构造一个字符串标记器。 The characters in the delim argument are the delimiters for separating tokens. delim参数中的字符是用于分隔标记的定界符。 Delimiter characters themselves will not be treated as tokens. 分隔符本身不会被视为标记。

This means that the DELIM is not a delimiter but all characters in it are delimiters (ie D , E , L , I , and M ). 这意味着DELIM不是定界符,而是其中的所有字符都是定界符(即DELIM )。

When you run the following code: 当您运行以下代码时:

public static void main(final String[] args) {
    final String delim = "DELIM";
    String token1 = "AAAAA";
    String token2 = "BBBBB|DUMMY";
    final String input = token1 + delim + token2;
    final StringTokenizer tokenizer = new StringTokenizer(input, delim);
    while(tokenizer.hasMoreElements()){
        System.out.println("token =" + tokenizer.nextToken());
    }
}

It gives the following output: 它给出以下输出:

token =AAAAA
token =BBBBB|
token =U
token =Y

As you can see that your input got split on D & M (which were present in your input). 如您所见,您的输入在DM (输入中存在)上分开了。

As the document explains, All characters in the delim argument are the delimiters for separating tokens. 如文档所述,delim参数中的所有字符都是用于分隔标记的定界符。

What you need to do instead is to use the split function. 您需要做的是使用split函数。

public static void main(final String[] args) {
    final String delim = "DELIM";
    String token1 = "AAAAA";
    String token2 = "BBBBB|DUMMY";
    final String input = token1 + delim + token2;

    final String[] tokens = input.split("DELIM");
    for (String token:tokens) {
        System.out.println(token);
    }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM