简体   繁体   中英

Tokenizing a String in Java without using split()

I am trying to write a method to tokenize a string into its respective words into an array. I have already tested my program with the split method and it works fine but I am trying to, instead, write up a tokenize method that does not use split. This is what I have tried so far:

public static String[] tokenize(String sentence) {
int wordCount = countWords(sentence);
String[] sentenceWords = new String[wordCount];
int curWord = 0;
char letter;

for(int i = 0; i < sentence.length()-1; i++) {
letter = sentence.charAt(i);
if (letter == ' ') {
  curWord++;
  continue;
}
System.out.println (sentenceWords[curWord]);
sentenceWords[curWord] = String.format("%s%c", sentenceWords[curWord], letter);
System.out.printf("%s\n", sentenceWords[curWord]);
}
return sentenceWords;
}

The output for this method was totally wrong. I got an output filled with a bunch of nulls and each word was on a new line.

I also tried another variation but did not get too far with it:

public static String[] tokenize(String sentence) {
int wordCount = countWords(sentence);
String[] sentenceWords = new String[wordCount];
for(int i = 0; i < sentence.length()-1; i++) {
if(sentence.contains(" ")) {
//Something.....
}
}
return sentenceWords;
}

I'm not sure what the right approach would be.

If what you're trying to do is split up each word and store it within an array this may help.

public static String[] tokenize(String sentence) 
{
    int wordCount = countWords(sentence);
    String[] wordArr = new String[wordCount];
    int wordCounter = 0;

    for(int i = 0; i < sentence.length(); i++)
    {
        if(sentence.charAt(i) == ' ' || i == sentence.length() - 1)
        {
            wordCounter++;

        }
        else
        {
            if(wordArr[wordCounter] == null)
            {
                wordArr[wordCounter] = "";
            }
            wordArr[wordCounter] += sentence.charAt(i);
        }

    }

    return wordArr;

}

This is similar to what you had but it initializes each word in the array before adding each character which explains why null was being outputted.

This also doesn't save the spaces just the words and doesn't take punctuation into account. Hope this helps!!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM