简体   繁体   English

在不使用.split() 或 StringTokenizer 的情况下在用户内容中查找单词

[英]Find word in User Content without using .split() or StringTokenizer

I'm working on a program that ask the user to input a phrase and an integer.我正在开发一个程序,要求用户输入一个短语和一个 integer。 The integer is used to identify which word will be return from the phrase. integer 用于识别将从短语中返回的单词。 For example, if they enter 5, the program should return to the user the fifth word in the sentence.例如,如果他们输入 5,程序应该将句子中的第五个单词返回给用户。

System.out.println("Your word is: " +combineString(phrase,numWord));

This is my work so far, there is a main to output,这是我迄今为止的工作,主要是output,

public static String combineString(String newPhrase, int newNum) {
  int countWords = 0;
  String word = "";

  //words count. I'll +1 everytime using countWord the match the amount of words
  for(int i=0; i< newPhrase.length(); i++) {
     if(newPhrase.charAt(i) == ' ') {
        countWords++;             
     }
  }  

  //return the last word. Ex: 15 words in a phrase if user pick the 18th word it will return the 15th word.
  if(countWords+1 < newNum  || countWords+1 <= newNum) {
     word += newPhrase.substring(newPhrase.lastIndexOf(' ')+1, newPhrase.length()-1);
  }
  else if(newNum <=0) { //return null if the user pick 0 or less than 0
     word += null;   
  }           
  return word;

And I was thinking a lot on how to work on the middle part and my thought are if the user pick numWord = 5, then in order to return the fifth word in that sentence, I'm gonna need to use "newPhrase.substring(space 4th +1, space 5th)".我想了很多关于如何处理中间部分的问题,我的想法是如果用户选择 numWord = 5,那么为了返回该句子中的第五个单词,我将需要使用“newPhrase.substring(第 4 个空格 +1,第 5 个空格)”。 And this is where I stuck because I don't know how to start, and how to get to space 4th.这就是我卡住的地方,因为我不知道如何开始,以及如何到达第四空间。

public static String combineString(String newPhrase, int newNum) {
     if(newNum<=0)
        return null;
     String word = "";
     String [] match = new String[newNum];

    int j =0;
    for(int i=0; i< newPhrase.length(); i++) {
        word = word + newPhrase.charAt(i);
        if(newPhrase.charAt(i) == ' ') { 
           match[j] = word;
           if(j+1 == newNum) {
              return word; // returns the specified word
           } 
           j++;
           word = "";    
       }
    } 
    return word; //returns last word
  }

This code should work for you.这段代码应该适合你。 If that's the case accept the answer.如果是这种情况,请接受答案。

public static String combineString(String newPhrase, int newNum) {
    try {
        return newPhrase.split(" ")[newNum - 1];
    } catch (ArrayIndexOutOfBoundsException e) {
        return null;
    }
}

If you want to go really low level, then you can go lower than subString and operate on single characters.如果你想 go 真的很低级,那么你可以 go 低于subString并操作单个字符。 This way its easy to skip other characters than blank.这样很容易跳过空白以外的其他字符。 Its also a step towards the way regular expression get executed by transforming them to finite state automatons.通过将正则表达式转换为有限的 state 自动机,这也是朝着执行正则表达式的方式迈出的一步。

enum ScanState {WHITESPACE, WORD}

private final static Set<Character> whitespace = new HashSet<>(Arrays.asList('"', ',', '.', '?', '!', '-', ';', ' '));

@Test
public void testTokenize() {
    char[] text = "No, it's been \"yes?\", and not \"no!\" - hasn't it?".toCharArray();
    List<String> expected = Arrays.asList("No", "it's", "been", "yes", "and", "not", "no", "hasn't", "it");
    assertEquals(expected, tokenize(text));
}

private List<String> tokenize(char[] text) {
    List<String> result = new ArrayList<String>();
    char[] word = new char[256];
    int maxLetter = 0;
    ScanState prevState = ScanState.WHITESPACE;

    for (char currentChar : text) {
        ScanState currState = whitespace.contains(currentChar) ? ScanState.WHITESPACE : ScanState.WORD;

        if (prevState == ScanState.WORD && currState == ScanState.WORD) {
            word[maxLetter++] = currentChar;
        }
        if (prevState == ScanState.WORD && currState == ScanState.WHITESPACE) {
            word[maxLetter++] = currentChar;
            result.add(String.valueOf(word, 0, maxLetter - 1));
        }
        if (prevState == ScanState.WHITESPACE && currState == ScanState.WORD) {
            maxLetter = 0;
            word[maxLetter++] = currentChar;
        }
        prevState = currState;
    }
    return result;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM