簡體   English   中英

斯坦福 Java NLP 選區標簽縮寫

[英]Stanford Java NLP Constituency labels abbreviations

使用斯坦福 Java CoreNLP 庫,我有這個:

            String text = "My name is Anthony";
            CoreDocument doc = new CoreDocument(text);
            pipeline.annotate(doc);
            for(Tree t : doc.sentences().get(0).constituencyParse()) {
                String tmp = "";
                for(Word w : t.yieldWords()) {
                    tmp = tmp + " " + w.word();
                }
                System.out.println(t.label().toString() + " - " + WordParts.getValue(t.label().toString()) + " - " + tmp);

現在,程序輸出如下:

ROOT - INVALID -  My name is Anthony
S - INVALID -  My name is Anthony
NP - INVALID -  My name
PRP$ - Possessive pronoun -  My
My-1 - INVALID -  My
NN - Singular noun -  name
name-2 - INVALID -  name
VP - INVALID -  is Anthony
VBZ - 3rd person singular present verb -  is
Subject:  Anthony
is-3 - INVALID -  is
NP - INVALID -  Anthony
NNP - Proper singular noun -  Anthony
Anthony-4 - INVALID -  Anthony

WordParts.java縮寫來自這篇文章( Java Stanford NLP: Part of Speech labels? ),類文件可以在這里找到:( https://github.com/AJ4real/References/blob/master/WordParts.java )我知道標簽不是Parts of Speech因為某些值返回INVALID ,那么我如何找到來自t.label().toString()的縮寫的完整術語?

其余的是 Penn Treebank 短語類別。 例如,請參見此處:

https://gist.github.com/nlothian/9240750

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM