如何從 java 中的文本文件中讀取數據以使用 StanfordNLP 提取數據而不是從簡單的字符串中讀取文本

Question

我嘗試使用 Annotation document = new Annotation("this is a simple string"); 並且還嘗試了 CoreDocument coreDocument = new CoreDocument(text); stanfordCoreNLP.annotate(coreDocument); 但無法解決它以從文本文件中讀取

Answer 1

如下使用（參見此處給出的示例）：

// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution 
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

// read some text from the file..
File inputFile = new File("src/test/resources/sample-content.txt");
String text = Files.asCharSource(inputFile, Charset.forName("UTF-8")).read();

// create an empty Annotation just with the given text
Annotation document = new Annotation(text);

// run all Annotators on this text
pipeline.annotate(document);

// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);

for(CoreMap sentence: sentences) {
  // traversing the words in the current sentence
  // a CoreLabel is a CoreMap with additional token-specific methods
  for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
    // this is the text of the token
    String word = token.get(TextAnnotation.class);
    // this is the POS tag of the token
    String pos = token.get(PartOfSpeechAnnotation.class);
    // this is the NER label of the token
    String ne = token.get(NamedEntityTagAnnotation.class);
    
    System.out.println("word: " + word + " pos: " + pos + " ne:" + ne);
  }

更新

或者，要讀取文件內容，您可以使用以下使用 Java 內置包的方法； 因此，不需要外部包。 根據文本文件中的字符，您可以選擇合適的Charset 。 如此處所述，“ ISO-8859-1是一個包羅萬象的字符集，因為它保證不會拋出MalformedInputException ”。 下面使用那個Charset 。

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

...
        Path path = Paths.get("sample-content.txt");
        String text = "";
        try {
            text = Files.readString(path, StandardCharsets.ISO_8859_1); //StandardCharsets.UTF_8
        } catch (IOException e) {
            e.printStackTrace();
        }

如何從 java 中的文本文件中讀取數據以使用 StanfordNLP 提取數據而不是從簡單的字符串中讀取文本

問題描述

1 個解決方案

解決方案1
0 已采納 2022-03-03 11:34:45

更新

如何從 java 中的文本文件中讀取數據以使用 StanfordNLP 提取數據而不是從簡單的字符串中讀取文本

問題描述

1 個解決方案

解決方案1 0 已采納 2022-03-03 11:34:45

更新

解決方案1
0 已采納 2022-03-03 11:34:45