简体   繁体   English

获取字符串中某个位置周围的单词

[英]Get words around a position in a string

I would like to get the words that are around a certain position in a string. 我想得到字符串中某个位置附近的单词。 For example two words after and two words before. 例如,后面两个单词和前面两个单词。

For example consider the string: 例如考虑字符串:

String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother.";
String find = "I";

for (int index = str.indexOf("I"); index >= 0; index = str.indexOf("I", index + 1))
{
    System.out.println(index);
}

This writes out the index of where the word "I" is. 这将写出单词“ I”所在位置的索引。 But I want to be able to get a substring of the words around these positions. 但我希望能够在这些位置周围得到单词的子字符串。

I want to be able to print out "John and I like to" and "and hiking I have two". 我希望能够打印出“约翰和我喜欢”和“徒步旅行我有两个”。

Not only single word strings should be able to be selected. 不仅应该选择单个单词字符串。 Search for "John and" will return " name is John and I like". 搜索“ John and”将返回“ name is John and I like”。

Is there any neat, smart way of doing this? 是否有任何巧妙的聪明方法?

Single word: 一个字:

You can achiveve that using String 's split() method . 您可以使用Stringsplit()方法实现这一点 This solution is O(n) . 这个解是O(n)

public static void main(String[] args) {
    String str = "Hello my name is John and I like to go fishing and "+
                         "hiking I have two sisters and one brother.";
    String find = "I";

    String[] sp = str.split(" +"); // "+" for multiple spaces
    for (int i = 2; i < sp.length; i++) {
        if (sp[i].equals(find)) {
            // have to check for ArrayIndexOutOfBoundsException
            String surr = (i-2 > 0 ? sp[i-2]+" " : "") +
                          (i-1 > 0 ? sp[i-1]+" " : "") +
                          sp[i] +
                          (i+1 < sp.length ? " "+sp[i+1] : "") +
                          (i+2 < sp.length ? " "+sp[i+2] : "");
            System.out.println(surr);
        }
    }
}

Output: 输出:

John and I like to
and hiking I have two

Multi-word: 多字:

Regex is a great and clean solution for case when find is a multi-word. 如果find是一个多单词,则Regex是一个很好的解决方案。 Due to its nature, though, it misses the cases when the the words around also match find (see the an example of this below). 但是,由于其性质,它错过了周围的单词也匹配find (请参见下面的示例)。

The algorithm below takes care of all cases (all solutions' space). 以下算法可处理所有情况(所有解决方案的空间)。 Bear in mind that, due to the nature of the problem, this solution in the worst case is O(n*m) (with n being str 's length and m being find 's length) . 请记住,由于问题的性质,在最坏的情况下,此解决方案是O(n * m) (其中nstr的长度, mfind的长度)

public static void main(String[] args) {
    String str = "Hello my name is John and John and I like to go...";
    String find = "John and";

    String[] sp = str.split(" +"); // "+" for multiple spaces

    String[] spMulti = find.split(" +"); // "+" for multiple spaces
    for (int i = 2; i < sp.length; i++) {
        int j = 0;
        while (j < spMulti.length && i+j < sp.length 
                                  && sp[i+j].equals(spMulti[j])) {
            j++;
        }           
        if (j == spMulti.length) { // found spMulti entirely
            StringBuilder surr = new StringBuilder();
            if (i-2 > 0){ surr.append(sp[i-2]); surr.append(" "); }
            if (i-1 > 0){ surr.append(sp[i-1]); surr.append(" "); }
            for (int k = 0; k < spMulti.length; k++) {
                if (k > 0){ surr.append(" "); }
                surr.append(sp[i+k]);
            }
            if (i+spMulti.length < sp.length) {
                surr.append(" ");
                surr.append(sp[i+spMulti.length]);
            }
            if (i+spMulti.length+1 < sp.length) {
                surr.append(" ");
                surr.append(sp[i+spMulti.length+1]);
            }
            System.out.println(surr.toString());
        }
    }
}

Output: 输出:

name is John and John and
John and John and I like

Here is another way I found out using Regex: 这是我使用正则表达式发现的另一种方法:

        String str = "Hello my name is John and I like to go fishing and hiking I have two    sisters and one brother.";

        String find = "I";

        Pattern pattern = Pattern.compile("([^\\s]+\\s+[^\\s]+)\\s+"+find+"\\s+([^\\s]+\\s[^\\s]+\\s+)");
        Matcher matcher = pattern.matcher(str);

        while (matcher.find())
        {
            System.out.println(matcher.group(1));
            System.out.println(matcher.group(2));
        }

Output: 输出:

John and
like to 
and hiking
have two 

Use String.split() to split the text into words. 使用String.split()将文本拆分为单词。 Then search for "I" and concatenate the words back together: 然后搜索“ I”并将单词重新连接在一起:

String[] parts=str.split(" ");

for (int i=0; i< parts.length; i++){
   if(parts[i].equals("I")){
     String out= parts[i-2]+" "+parts[i-1]+ " "+ parts[i]+ " "+parts[i+1] etc..
   }
}

Ofcourse you need to check if i-2 is a valid index, and using a StringBuffer would be handy performance wise, if you have a lot of data ... 当然,您需要检查i-2是否为有效索引,并且如果您有大量数据,则使用StringBuffer在性能上会很方便。

// Convert sentence to ArrayList
String[] stringArray = sentence.split(" ");
List<String> stringList = Arrays.asList(stringArray);

// Which word should be matched?
String toMatch = "I";

// How much words before and after do you want?
int before = 2;
int after = 2;

for (int i = 0; i < stringList.size(); ++i) {
    if (toMatch.equals(stringList.get(i))) {
        int index = i;
        if (0 <= index - before && index + after <= stringList.size()) {
            StringBuilder sb = new StringBuilder();

            for (int i = index - before; i <= index + after; ++i) {
                sb.append(stringList.get(i));
                sb.append(" ");
            }
            String result = sb.toString().trim();
            //Do something with result
        }
    }
}

This extracts the two words before and after the match. 这将提取匹配之前和之后的两个单词。 Could be extended to print at most two words before and after and not exactly two words. 可以扩展前后和不完全两个词两个字来打印。

EDIT Damn.. way to slow and no fancy ternary operators :/ 编辑该死的..方式慢,没有花哨的三元运算符:/

public static void main(String[] args) {
    String str = "Hello my name is John and I like to go fishing and hiking I have two    sisters and one brother.";
    String find = "I";
    int countWords = 3;
    List<String> strings = countWordsBeforeAndAfter(str, find, countWords);
    strings.stream().forEach(System.out::println);
}

public static List<String> countWordsBeforeAndAfter(String paragraph, String search, int countWordsBeforeAndAfter){
    List<String> searchList = new ArrayList<>();
    String str = paragraph;
    String find = search;
    int countWords = countWordsBeforeAndAfter;
    String[] sp = str.split(" +"); // "+" for multiple spaces
    for (int i = 0; i < sp.length; i++) {
        if (sp[i].equals(find)) {

            String before = "";
            for (int j = countWords; j > 0; j--) {
                if(i-j >= 0) before += sp[i-j]+" ";
            }

            String after = "";
            for (int j = 1; j <= countWords; j++) {
                if(i+j < sp.length) after += " " + sp[i+j];
            }
            String searhResult = before + find + after;
           searchList.add(searhResult);
        }
    }
    return searchList;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM