简体   繁体   中英

Stanford CoreNLP: input with one sentence per line

I'm using the Stanford NLP tool for college work. This parser ends the sentences at every point (period) but I need also to close in each line, that is, in each character ' \\ n' . by command line, you can use the option " -sentences " but so far there is not a similar command for code .

The option setOptionFlags from LexicalizedParser did not work either

Here is some sample code to elaborate on Gabor's answer:

import java.nio.file.Paths;
import java.nio.file.Files;
import java.nio.charset.StandardCharsets;

import java.io.*;
import java.util.*;
import java.nio.file.Paths;
import java.nio.file.Files;
import java.nio.charset.StandardCharsets;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.ling.CoreAnnotations.*;
import edu.stanford.nlp.util.*;

public class ParserExample {

    public static void main (String[] args) throws IOException {
        String text = new String(Files.readAllBytes(Paths.get(args[0])), StandardCharsets.UTF_8);
        Annotation document = new Annotation(text);
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse");
        props.setProperty("ssplit.newlineIsSentenceBreak", "always");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        pipeline.annotate(document);
    }

}

args[0] should be the path to your file with one sentence per line

You will need to download Stanford CoreNLP 3.5.2 from this link and put the jars from the download in your classpath: http://nlp.stanford.edu/software/corenlp.shtml

You can set other options for the parser with props.setProperty()

If you have a file with one sentence per line, you can use

props.setProperty("ssplit.eolonly", "true");

if you only want to split on newlines.

The option you're looking for is ssplit.newlineIsSentenceBreak = always (or, on the command line, -ssplit.newlineIsSentenceBreak always ). This will always split sentences on a newline, in addition to splitting on the usual punctuation. See http://nlp.stanford.edu/software/corenlp.shtml

在属性文件中,添加:

ssplit.newlineIsSentenceBreak = always

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM