Stanford CoreNLP Options in Scala

Question

Hello I am trying to update the following options in Stanford CoreNLP:

ssplit.newlineIsSentenceBreak
- https://stanfordnlp.github.io/CoreNLP/ssplit.html
ner.applyFineGrained
- https://stanfordnlp.github.io/CoreNLP/ner.html

I am running Spark in Scala with the following versions:

Software	Version
Spark	2.3.0
Scala	2.11.8
Java	8 (1.8.0_73)
spark-corenlp	0.3.1
stanford-corenlp	3.9.1

I have found what I believe is the definition on where the newlineIsSentenceBreak option is updated but when I try and implement I keep getting error messages.

https://nlp.stanford.edu/nlp/javadoc/javanlp-3.9.1/edu/stanford/nlp/process/WordToSentenceProcessor.html

Here is a working code snippet:

import edu.stanford.nlp.process.WordToSentenceProcessor

WordToSentenceProcessor.NewlineIsSentenceBreak.values
WordToSentenceProcessor.NewlineIsSentenceBreak.valueOf("ALWAYS")

But when I try and set the option I get an error. Specifically I am trying to implement something similar to:

WordToSentenceProcessor.NewlineIsSentenceBreak.stringToNewlineIsSentenceBreak("ALWAYS")

but I get this error:

error: value stringToNewlineIsSentenceBreak is not a member of object edu.stanford.nlp.process.WordToSentenceProcessor.NewlineIsSentenceBreak

Any help is appreciated!

Answer 1

Thank you stackoverflow for being my rubber duck! https://en.wikipedia.org/wiki/Rubber_duck_debugging

To set the parameters in Scala (not using the spark wrapper functions) you can assign it to the properties of the pipeline object like this:

val props: Properties = new Properties()
props.put("annotators", "tokenize,ssplit,pos,lemma,ner")
props.put("ssplit.newlineIsSentenceBreak", "always")
props.put("ner.applyFineGrained", "false")

Before creating a Stanford Core NLP pipeline:

val pipeline: StanfordCoreNLP = new StanfordCoreNLP(props)

Because the Spark wrapper functions use the simple implementation I don't think I can modify them? Please post an answer if you are aware of how to do that!

Here is a full example:

import java.util.Properties

import edu.stanford.nlp.ling.CoreAnnotations.{SentencesAnnotation, TextAnnotation, TokensAnnotation}
import edu.stanford.nlp.ling.CoreLabel
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.util.CoreMap

import scala.collection.JavaConverters._

val props: Properties = new Properties()
props.put("annotators", "tokenize,ssplit,pos,lemma,ner")
props.put("ssplit.newlineIsSentenceBreak", "always")
props.put("ner.applyFineGrained", "false")
val pipeline: StanfordCoreNLP = new StanfordCoreNLP(props)
val text = "Quick brown fox jumps over the lazy dog. This is Harshal, he lives in Chicago.  I added \nthis sentence"

// create blank annotator
val document: Annotation = new Annotation(text)

// run all Annotator - Tokenizer on this text
pipeline.annotate(document)

val sentences: List[CoreMap] = document.get(classOf[SentencesAnnotation]).asScala.toList

(for {
    sentence: CoreMap <- sentences
    token: CoreLabel <- sentence.get(classOf[TokensAnnotation]).asScala.toList
    lemmas: String = token.word()
    ner = token.ner()
} yield (sentence, lemmas, ner)) foreach(t => println("sentence: " + t._1 + " | lemmas: " + t._2 + " | ner: " +  t._3))

Stanford CoreNLP Options in Scala

Question

1 answers

solution1
0 2022-08-19 18:43:09

Stanford CoreNLP Options in Scala

Question

1 answers

solution1 0 2022-08-19 18:43:09

solution1
0 2022-08-19 18:43:09