简体   繁体   English

Java-如何为特定单词解析字符串中的单词

[英]Java- how to parse for words in a string for a specific word

How would I parse for the word "hi" in the sentence "hi, how are you?"我将如何解析句子“嗨,你好吗?”中的“嗨”这个词。 or in parse for the word "how" in "how are you?"?或解析“你好吗?”中的“如何”?

example of what I want in code:我想要的代码示例:

String word = "hi";
String word2 = "how";
Scanner scan = new Scanner(System.in).useDelimiter("\n");
String s = scan.nextLine();
if(s.equals(word)) {
System.out.println("Hey");
}
if(s.equals(word2)) {
System.out.println("Hey");
}

To just find the substring, you can use contains or indexOf or any other variant:要找到子字符串,您可以使用containsindexOf或任何其他变体:

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html

if( s.contains( word ) ) {
   // ...
}

if( s.indexOf( word2 ) >=0 ) {
   // ...
}

If you care about word boundaries, then StringTokenizer is probably a good approach.如果您关心单词边界,那么StringTokenizer可能是一个好方法。

https://docs.oracle.com/javase/1.5.0/docs/api/java/util/StringTokenizer.html https://docs.oracle.com/javase/1.5.0/docs/api/java/util/StringTokenizer.html

You can then perform a case-insensitive check (equalsIgnoreCase) on each word.然后,您可以对每个单词执行不区分大小写的检查 (equalsIgnoreCase)。

Looks like a job for Regular Expressions .看起来像正则表达式的工作。 Contains would give a false positive on, say, "hire-purchase" . Contains会给出一个误报,比如"hire-purchase"

if (Pattern.match("\\bhi\\b", stringToMatch)) { //...

I'd go for the java.util.StringTokenizer : https://docs.oracle.com/javase/1.5.0/docs/api/java/util/StringTokenizer.html我会去java.util.StringTokenizerhttps : //docs.oracle.com/javase/1.5.0/docs/api/java/util/StringTokenizer.html

StringTokenizer st = new StringTokenizer(
    "Hi, how are you?", 
    ",.:?! \t\n\r"       //whitespace and puntuation as delimiters
);
 while (st.hasMoreTokens()) {
     if(st.nextToken().equals("Hi")){
         //matches "Hi"
     }
 }

Alternatively, take a look at java.util.regex and use regular expressions.或者,查看java.util.regex并使用正则表达式。

I'd go for a tokenizer , instead.相反,我会选择tokenizer Set space and other elements like commas, full stops etc. as delimiters.将空格和其他元素(如逗号、句号等)设置为分隔符。 And rememeber to compare in case-insensitive mode.并记住在不区分大小写的模式下进行比较。

This way you can find "hi" in "Hi, how is his test going" without getting a false positive on "his" and a false negative on "Hi" (starts with a uppercase H).通过这种方式,您可以在“嗨,他的测试进展如何”中找到“嗨”,而不会在“hi”上得到假阳性,在“Hi”上得到假阴性(以大写字母 H 开头)。

You can pass a regular expression to the next() method of Scanner .您可以将正则表达式传递给Scannernext()方法。 So you can iterate through each word in the input (Scanner delimits on whitespace by default) and perform the appropriate processing if you get a match.因此,您可以遍历输入中的每个单词(默认情况下扫描仪以空格分隔)并在匹配时执行适当的处​​理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM