简体   繁体   中英

StringTokenizer showing unwanted results

When I ran the following code I found something strange.

The output of below program is token1=AAAAA token2=BBBBB|

However, From my understanding, it should be token1=AAAAA token2=BBBBB|DUMMY

public class TestToken {

    public static void main(final String[] args) {
        final String delim = "DELIM";
        String token1 = "AAAAA";
        String token2 = "BBBBB|DUMMY";
        final String input = token1 + delim + token2;
        final StringTokenizer tokenizer = new StringTokenizer(input, delim);
        final String text1 = tokenizer.nextToken();
        final String text2 = tokenizer.nextToken();
        System.out.println("token1=" + text1);
        System.out.println("token2=" + text2);
        System.out.println();
    }

}

Can some one explain me how to fix this problem and why it is behaving like this ?

Excerpt from the constructor's documentation :

The characters in the delim argument are the delimiters for separating tokens.

That means that each character is a delimiter, not the whole string. In fact, you have 5 delimiters (the characters D , E , L , I , and M ).

You can see the effect with the following code

while (tokenizer.hasMoreTokens())
   System.out.println(tokenizer.nextToken());

which prints out:

AAAAA
BBBBB|
U
Y

No your delimiters are DELI and M

See the javadocs All characters in the delim argument are the delimiters for separating tokens.

delim - the delimiters.

consider

    final String delim = "DELIM";
    String token1 = "AAAAA";
    String token2 = "BBBBB|ZUMMY";
    final String input = token1 + delim + token2;
    final StringTokenizer tokenizer = new StringTokenizer(input, delim);
    final String text1 = tokenizer.nextToken();
    final String text2 = tokenizer.nextToken();
    System.out.println("token1=" + text1);
    System.out.println("token2=" + text2);
    System.out.println();

StringTokenizer takes a string where each character is a delimiter . Since D is one of your delimiters, the second token is cut off after the | .

If you want to use multi-character delimiters, you'll have to use a different technique. Eg, split :

String[] parts = Arrays.toString(input.split(delim)); 

There are multiple option to implement this functionality. To start with why it is happening, I think that is well explained by various posts that it is because your delimeter is not "DELIM", instead it is "D","E","L","I","M"

Now what you can use if you want to seperate string based on another string like DELIM

Option 1: Use String split method which will take the delimeter string as argument and will return the array of tokens

String statement = "AAAADELIMBBBB|DUMMY";
String tokens[] = statement.split("DELIM");

Option 2: Using splitAsStream which will take statement as argument and compile will take regex delimiter as argument

Pattern.compile("DELIM").splitAsStream("AAAADELIMBBBB|DUMMY").forEach(System.out::println);

Option 3 : Using Stream.of with split as arguement

Stream.of("AAAADELIMBBBB|DUMMY".split("DELIM")).forEach(System.out::println);

Apart from above super cool ways to split, if you are a die hard fan of String Tokenizer and want to implement it using that only, you can also use String Tokenizer with "D" as delimeter and then for each token received, can check for first four character to be "ELIM". If yes, take the remaninng substring as token and concatenate with further receiving tokens and if not append D in start and then append with the current token.

From the doc of StringTokenizer

Constructs a string tokenizer for the specified string. The characters in the delim argument are the delimiters for separating tokens. Delimiter characters themselves will not be treated as tokens.

This means that the DELIM is not a delimiter but all characters in it are delimiters (ie D , E , L , I , and M ).

When you run the following code:

public static void main(final String[] args) {
    final String delim = "DELIM";
    String token1 = "AAAAA";
    String token2 = "BBBBB|DUMMY";
    final String input = token1 + delim + token2;
    final StringTokenizer tokenizer = new StringTokenizer(input, delim);
    while(tokenizer.hasMoreElements()){
        System.out.println("token =" + tokenizer.nextToken());
    }
}

It gives the following output:

token =AAAAA
token =BBBBB|
token =U
token =Y

As you can see that your input got split on D & M (which were present in your input).

As the document explains, All characters in the delim argument are the delimiters for separating tokens.

What you need to do instead is to use the split function.

public static void main(final String[] args) {
    final String delim = "DELIM";
    String token1 = "AAAAA";
    String token2 = "BBBBB|DUMMY";
    final String input = token1 + delim + token2;

    final String[] tokens = input.split("DELIM");
    for (String token:tokens) {
        System.out.println(token);
    }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM