Hello I am trying to update the following options in Stanford CoreNLP:
I am running Spark in Scala with the following versions:
Software | Version |
---|---|
Spark | 2.3.0 |
Scala | 2.11.8 |
Java | 8 (1.8.0_73) |
spark-corenlp | 0.3.1 |
stanford-corenlp | 3.9.1 |
I have found what I believe is the definition on where the newlineIsSentenceBreak option is updated but when I try and implement I keep getting error messages.
Here is a working code snippet:
import edu.stanford.nlp.process.WordToSentenceProcessor
WordToSentenceProcessor.NewlineIsSentenceBreak.values
WordToSentenceProcessor.NewlineIsSentenceBreak.valueOf("ALWAYS")
But when I try and set the option I get an error. Specifically I am trying to implement something similar to:
WordToSentenceProcessor.NewlineIsSentenceBreak.stringToNewlineIsSentenceBreak("ALWAYS")
but I get this error:
error: value stringToNewlineIsSentenceBreak is not a member of object edu.stanford.nlp.process.WordToSentenceProcessor.NewlineIsSentenceBreak
Any help is appreciated!
Thank you stackoverflow for being my rubber duck! https://en.wikipedia.org/wiki/Rubber_duck_debugging
To set the parameters in Scala (not using the spark wrapper functions) you can assign it to the properties of the pipeline object like this:
val props: Properties = new Properties()
props.put("annotators", "tokenize,ssplit,pos,lemma,ner")
props.put("ssplit.newlineIsSentenceBreak", "always")
props.put("ner.applyFineGrained", "false")
Before creating a Stanford Core NLP pipeline:
val pipeline: StanfordCoreNLP = new StanfordCoreNLP(props)
Because the Spark wrapper functions use the simple implementation I don't think I can modify them? Please post an answer if you are aware of how to do that!
Here is a full example:
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations.{SentencesAnnotation, TextAnnotation, TokensAnnotation}
import edu.stanford.nlp.ling.CoreLabel
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.util.CoreMap
import scala.collection.JavaConverters._
val props: Properties = new Properties()
props.put("annotators", "tokenize,ssplit,pos,lemma,ner")
props.put("ssplit.newlineIsSentenceBreak", "always")
props.put("ner.applyFineGrained", "false")
val pipeline: StanfordCoreNLP = new StanfordCoreNLP(props)
val text = "Quick brown fox jumps over the lazy dog. This is Harshal, he lives in Chicago. I added \nthis sentence"
// create blank annotator
val document: Annotation = new Annotation(text)
// run all Annotator - Tokenizer on this text
pipeline.annotate(document)
val sentences: List[CoreMap] = document.get(classOf[SentencesAnnotation]).asScala.toList
(for {
sentence: CoreMap <- sentences
token: CoreLabel <- sentence.get(classOf[TokensAnnotation]).asScala.toList
lemmas: String = token.word()
ner = token.ner()
} yield (sentence, lemmas, ner)) foreach(t => println("sentence: " + t._1 + " | lemmas: " + t._2 + " | ner: " + t._3))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.