简体   繁体   中英

Find word in User Content without using .split() or StringTokenizer

I'm working on a program that ask the user to input a phrase and an integer. The integer is used to identify which word will be return from the phrase. For example, if they enter 5, the program should return to the user the fifth word in the sentence.

System.out.println("Your word is: " +combineString(phrase,numWord));

This is my work so far, there is a main to output,

public static String combineString(String newPhrase, int newNum) {
  int countWords = 0;
  String word = "";

  //words count. I'll +1 everytime using countWord the match the amount of words
  for(int i=0; i< newPhrase.length(); i++) {
     if(newPhrase.charAt(i) == ' ') {
        countWords++;             
     }
  }  

  //return the last word. Ex: 15 words in a phrase if user pick the 18th word it will return the 15th word.
  if(countWords+1 < newNum  || countWords+1 <= newNum) {
     word += newPhrase.substring(newPhrase.lastIndexOf(' ')+1, newPhrase.length()-1);
  }
  else if(newNum <=0) { //return null if the user pick 0 or less than 0
     word += null;   
  }           
  return word;

And I was thinking a lot on how to work on the middle part and my thought are if the user pick numWord = 5, then in order to return the fifth word in that sentence, I'm gonna need to use "newPhrase.substring(space 4th +1, space 5th)". And this is where I stuck because I don't know how to start, and how to get to space 4th.

public static String combineString(String newPhrase, int newNum) {
     if(newNum<=0)
        return null;
     String word = "";
     String [] match = new String[newNum];

    int j =0;
    for(int i=0; i< newPhrase.length(); i++) {
        word = word + newPhrase.charAt(i);
        if(newPhrase.charAt(i) == ' ') { 
           match[j] = word;
           if(j+1 == newNum) {
              return word; // returns the specified word
           } 
           j++;
           word = "";    
       }
    } 
    return word; //returns last word
  }

This code should work for you. If that's the case accept the answer.

public static String combineString(String newPhrase, int newNum) {
    try {
        return newPhrase.split(" ")[newNum - 1];
    } catch (ArrayIndexOutOfBoundsException e) {
        return null;
    }
}

If you want to go really low level, then you can go lower than subString and operate on single characters. This way its easy to skip other characters than blank. Its also a step towards the way regular expression get executed by transforming them to finite state automatons.

enum ScanState {WHITESPACE, WORD}

private final static Set<Character> whitespace = new HashSet<>(Arrays.asList('"', ',', '.', '?', '!', '-', ';', ' '));

@Test
public void testTokenize() {
    char[] text = "No, it's been \"yes?\", and not \"no!\" - hasn't it?".toCharArray();
    List<String> expected = Arrays.asList("No", "it's", "been", "yes", "and", "not", "no", "hasn't", "it");
    assertEquals(expected, tokenize(text));
}

private List<String> tokenize(char[] text) {
    List<String> result = new ArrayList<String>();
    char[] word = new char[256];
    int maxLetter = 0;
    ScanState prevState = ScanState.WHITESPACE;

    for (char currentChar : text) {
        ScanState currState = whitespace.contains(currentChar) ? ScanState.WHITESPACE : ScanState.WORD;

        if (prevState == ScanState.WORD && currState == ScanState.WORD) {
            word[maxLetter++] = currentChar;
        }
        if (prevState == ScanState.WORD && currState == ScanState.WHITESPACE) {
            word[maxLetter++] = currentChar;
            result.add(String.valueOf(word, 0, maxLetter - 1));
        }
        if (prevState == ScanState.WHITESPACE && currState == ScanState.WORD) {
            maxLetter = 0;
            word[maxLetter++] = currentChar;
        }
        prevState = currState;
    }
    return result;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM