[英]In R, how to apply a function on each dataframe row that uses a column value?
Let's say I have a dataframe假设我有一个 dataframe
Author |作者 | Lyrics |
歌词 |
Name1 Text (characters) Name1 文本(字符)
Name2 Text (characters) Name2 文本(字符)
I want to create another column through applying a function that for each row takes the Text from the Text column, separates by whitespaces, then iterates over each token to see if it is within another vector I made (so I can work out the percentage of tokens within the text that are within that other vector).我想通过应用 function 来创建另一列,该列对于每一行从 Text 列中获取 Text,用空格分隔,然后遍历每个标记以查看它是否在我制作的另一个向量中(这样我就可以计算出文本中位于该其他向量中的标记)。
The function as I have written so far is below到目前为止我写的 function 如下
ReturnPercentPosWord = function(textLyrics){
WhitespaceSplitText = strsplit(textLyrics, " ")
LengthSplitText = length(WhitespaceSplitText)
CountInPosList = 0
for (i in WhitespaceSplitText) {
if (i %in% PositiveWords$word) {
CountInPosList = CountInPosList+1
}
}
if (CountInPosList == 0) {
return(0)
}
PercentInPos = (CountInPosList/LengthSplitText)*100
return(PercentInPos)}
I want to apply this function to each row now.我现在想将此 function 应用于每一行。 I have tried
我努力了
TestPOSwordsDF$PercentPositiveWords = ReturnPercentPosWord(TestPOSwordsDF$Lyrics)
and和
TestPOSwordsDF$PercentPositiveWords = apply(TestPOSwordsDF[, c('Lyrics'),drop=F], 1, ReturnPercentPosWord)
But I get a message saying the condition has length > 1 and only the first element will be used
但是我收到一条消息,说
the condition has length > 1 and only the first element will be used
I would really appreciate any help with this.我真的很感激这方面的任何帮助。 Thank you!
谢谢!
Try using this:尝试使用这个:
TestPOSwordsDF$PercentPositiveWords <- sapply(
strsplit(TestPOSwordsDF$Lyrics, " "), function(x)
mean(x %in% PositiveWords$word) * 100)
Here we split Lyrics
on space, get the ratio of words which are present in PositiveWords$word
.在这里,我们在空间上分割
Lyrics
,得到PositiveWords$word
中出现的单词的比率。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.