简体繁体 English

向“ R情感词典”中添加新单词

[英]Add new words to the lexicon for R sentiment package

原文 2012-11-18 08:24:27 6 1 r/ sentiment-analysis

I'm currently doing sentiment and emotion analysis of Twitter's data using R sentiment package and need to add new words to the subjectivity and emotion lexicons used by the package as there are some words that carry specific sentiment and emotion in the topic that I analyze. 我目前正在使用R情感包对Twitter数据进行情感和情感分析，并且需要在该包使用的主观性和情感词典中添加新单词，因为在我分析的主题中有些单词带有特定的情感和情感。

Does anyone know how to add words to the lexicon using the R sentiment package itself or any other R command? 有谁知道如何使用R情感包本身或任何其他R命令在词典中添加单词？ I have searched in the documentation but cannot find any means to do so. 我已经搜索了文档，但是找不到任何方法。

1 个解决方案

Both the subjectivity & emotion lexicon when read in (say, as csv) construct a data frame for you. 读入（例如，csv）时的主观性和情感词典都会为您构建一个数据框。 Adding entries to a data frame can be done using the rbind() function. 可以使用rbind（）函数将条目添加到数据框。

> patientID <- c(1, 2, 3, 4) >患者ID <-c（1、2、3、4）
> age <- c(25, 34, 28, 52) >年龄<-c（25，34，28，52）
> diabetes <- c("Type1", "Type2", "Type1", "Type1") >糖尿病<-c（“ Type1”，“ Type2”，“ Type1”，“ Type1”）
> status <- c("Poor", "Improved", "Excellent", "Poor") >状态<-c（“差”，“改善”，“优”，“差”）
> patientdata <- data.frame(patientID, age, diabetes, status) > Patientdata <-data.frame（患者ID，年龄，糖尿病，状态）
> patientdata >患者数据
patientID age diabetes status PatientID年龄糖尿病状态
1 1 25 Type1 Poor 1 1 25 Type1较差
2 2 34 Type2 Improved 2 2 34 Type2改进
3 3 28 Type1 Excellent 3 3 28 Type1优秀
4 4 52 Type1 Poor 4 4 52 Type1较差
> patientID <- c(10, 20, 30, 40) >患者ID <-c（10，20，30，40）
> age <- c(50, 68, 56, 104) >年龄<-c（50，68，56，104）
> diabetes <- c("Type4", "Type5", "Type6", "Type7") >糖尿病<-c（“ Type4”，“ Type5”，“ Type6”，“ Type7”）
> status <- c("Poorish", "Improving", "Excellento", "Poorish") >状态<-c（“差”，“正在改善”，“ Excellento”，“差”）
> patientdata1 <- data.frame(patientID, age, diabetes, status) > Patientdata1 <-data.frame（患者ID，年龄，糖尿病，状态）
> patientdata1 > Patientdata1
patientID age diabetes status PatientID年龄糖尿病状态
1 10 50 Type4 Poorish 1 10 50 Type4差
2 20 68 Type5 Improving 2 20 68 Type5改进中
3 30 56 Type6 Excellento 3 30 56 Type6优秀
4 40 104 Type7 Poorish 4 40 104 Type7差
> concatPD <- rbind(patientdata,patientdata1) > concatPD <-rbind（患者数据，患者数据1）
> concatPD > concatPD
patientID age diabetes status PatientID年龄糖尿病状态
1 1 25 Type1 Poor 1 1 25 Type1较差
2 2 34 Type2 Improved 2 2 34 Type2改进
3 3 28 Type1 Excellent 3 3 28 Type1优秀
4 4 52 Type1 Poor 4 4 52 Type1较差
5 10 50 Type4 Poorish 5 10 50 Type4差
6 20 68 Type5 Improving 6 20 68 Type5改进中
7 30 56 Type6 Excellento 7 30 56 Type6优秀
8 40 104 Type7 Poorish 8 40 104类型7差
> >

I simply added the weird types of diabetes to ensure that the 2 frames are distinguished. 我只是添加了奇怪的糖尿病类型，以确保区分这两个框架。 :-) In other words, you can create our own csv, read it (thereby creating another data frame) & rbind them. :-)换句话说，您可以创建我们自己的csv，对其进行读取（从而创建另一个数据框）并rbind它们。 Ensure that the columns of both data frames are in sync. 确保两个数据帧的列同步。