简体   繁体   中英

How to extract a sentence in paragraph use Regular Exepression in java

I have a paragraph text. I want to extract two or three sentences which contain keyword use regular expression in java

Example : paragraph: ....My name is Tom. I live with my family in the countryside. I love the animal. So I have a dog and a cat. However, we eat a lot......

keyword : a dog and a cat

Desired result : I love the animal. So I have a dog and a cat. However, we eat a lot

Note : I use Regular Expression in java.

     String line = ".My name is Tom. I live with my family in the countryside. I love the animal. So I have a dog and a cat. However, we eat a lot......  "
      String pattern = "a dog and a cat";
      Pattern r = Pattern.compile(pattern);
      Matcher m = r.matcher(line);
      boolean value= false;
      if (m.find( )) {
          System.out.println(m.toMatchResult());
          System.out.println(m.groupCount());
          System.out.println(m.group());
      } else {
         System.out.println("False");
      }

Here's the pattern you want:

\.([^.]+\.[^.]*a dog and a cat[^.]*\.[^.]+)

Since you're in Java, remember to double up the backslashes when encoding it as a string.

Basically, what it'll do is match a literal dot, then any string of characters that isn't a dot (first sentence), another literal dot, the middle sentence containing your literal, then another sequence of characters that isn't a dot (the third sentence).

Demo on Regex101

I made this class for one of my projects. Hope it helps.

import java.text.BreakIterator;
import java.util.ArrayList;
import java.util.List;
import java.util.Locale;

public class ExtractSentences {

    private String paragraph;
    private BreakIterator iterator;
    private List<String> sentences;


    public ExtractSentences(String paragraph) {
        this.paragraph = paragraph;
        sentences = new ArrayList();
        this.extractSentences();
    }

    public void extractSentences() {

        iterator = BreakIterator.getSentenceInstance(Locale.US);


        iterator.setText(paragraph);

        int lastIndex = iterator.first();

        while (lastIndex != BreakIterator.DONE) {
            int firstIndex = lastIndex;
            lastIndex = iterator.next();

            if (lastIndex != BreakIterator.DONE) {
                String sentence = paragraph.substring(firstIndex, lastIndex);

                sentences.add(sentence);


            }
        }

    }

    public String getParagraph() {
        return paragraph;
    }

    public void setParagraph(String paragraph) {
        this.paragraph = paragraph;
    }

    public void setSentences(List<String> sentences) {
        this.sentences = sentences;
    }

    public List<String> getSentences() {
        return sentences;

    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM