如何以XML-LMF格式導入詞典以在R中進行情感分析

Question

我正在嘗試在R中導入以下詞典，以與諸如quanteda類的文本挖掘程序包一起使用，或將其導出為列表或數據框：

https://github.com/opener-project/VU-sentiment-lexicon/tree/master/VUSentimentLexicon/IT-lexicon

格式為XML-LMF。 我找不到用R解析這種格式的任何方法。

（請參閱https://en.wikipedia.org/wiki/Lexical_Markup_Framework ）

作為一種解決方法，我嘗試使用XML包，但是結構與通常的XML有所不同，並且我沒有設法解析所有節點。

Answer 1

我設法使用xml2包使其工作。 這是我的代碼：

library(xml2)
library(quanteda)

# Read file and find the nodes
opeNER_xml <- read_xml("it-sentiment_lexicon.lmf.xml")
entries <- xml_find_all(opeNER_xml, ".//LexicalEntry")
lemmas <- xml_find_all(opeNER_xml, ".//Lemma")
confidence <- xml_find_all(opeNER_xml, ".//Confidence")
sentiment <- xml_find_all(opeNER_xml, ".//Sentiment")

# Parse and put in a data frame
opeNER_df <- data.frame(
  id = xml_attr(entries, "id"),
  lemma = xml_attr(lemmas, "writtenForm"),
  partOfSpeech = xml_attr(entries, "partOfSpeech"),
  confidenceScore = as.numeric(xml_attr(confidence, "score")),
  method = xml_attr(confidence, "method"),
  polarity = as.character(xml_attr(sentiment, "polarity")),
  stringsAsFactors = F
)
# Fix a mistake
opeNER_df$polarity <- ifelse(opeNER_df$polarity == "nneutral", 
                             "neutral", opeNER_df$polarity)

# Make quanteda dictionary
opeNER_dict <- quanteda::dictionary(with(opeNER_df, split(lemma, polarity)))

如何以XML-LMF格式導入詞典以在R中進行情感分析

問題描述

1 個解決方案

解決方案1
0 已采納 2019-01-25 13:32:36

如何以XML-LMF格式導入詞典以在R中進行情感分析

問題描述

1 個解決方案

解決方案1 0 已采納 2019-01-25 13:32:36

解決方案1
0 已采納 2019-01-25 13:32:36