I have a word document with numbering like 1. ,2. etc. I want to extract sentences from the document. I use Stanford CoreNLP 4.0.0 and stanford-corenlp-models-current.jar Normal extraction of sentences retrieve numbers as different sentence. Suppose document has
Sentence extraction gets 1 as a sentence and Abcd efgh as another sentence.
Similarly 2 as a sentence and Ijkl mnop as another sentence.
I try with boundariesToDiscard properties with different patterns but get same result and also get wrong entity mentions in this case.
Please help to resolve this issue.
Thanks in advance.
I solve the problem. I just set the following property
props.setProperty("ssplit.eolonly", "true");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.