简体   繁体   中英

Stanford CoreNLP can not detect sentence with numbering

I have a word document with numbering like 1. ,2. etc. I want to extract sentences from the document. I use Stanford CoreNLP 4.0.0 and stanford-corenlp-models-current.jar Normal extraction of sentences retrieve numbers as different sentence. Suppose document has

  1. Abcd efgh....
  2. Ijkl mnop....

Sentence extraction gets 1 as a sentence and Abcd efgh as another sentence.

Similarly 2 as a sentence and Ijkl mnop as another sentence.

I try with boundariesToDiscard properties with different patterns but get same result and also get wrong entity mentions in this case.

Please help to resolve this issue.

Thanks in advance.

I solve the problem. I just set the following property

props.setProperty("ssplit.eolonly", "true");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM