简体   繁体   中英

In R, how to apply a function on each dataframe row that uses a column value?

Let's say I have a dataframe

Author | Lyrics |

Name1 Text (characters)

Name2 Text (characters)

I want to create another column through applying a function that for each row takes the Text from the Text column, separates by whitespaces, then iterates over each token to see if it is within another vector I made (so I can work out the percentage of tokens within the text that are within that other vector).

The function as I have written so far is below

ReturnPercentPosWord = function(textLyrics){

WhitespaceSplitText = strsplit(textLyrics, " ")

LengthSplitText = length(WhitespaceSplitText)

CountInPosList = 0

for (i in WhitespaceSplitText) {

if (i %in% PositiveWords$word) {
  CountInPosList = CountInPosList+1
}

}

 if (CountInPosList == 0) {
return(0)

}

PercentInPos = (CountInPosList/LengthSplitText)*100

return(PercentInPos)}

I want to apply this function to each row now. I have tried

TestPOSwordsDF$PercentPositiveWords = ReturnPercentPosWord(TestPOSwordsDF$Lyrics)

and

TestPOSwordsDF$PercentPositiveWords = apply(TestPOSwordsDF[, c('Lyrics'),drop=F], 1, ReturnPercentPosWord)

But I get a message saying the condition has length > 1 and only the first element will be used

I would really appreciate any help with this. Thank you!

Try using this:

TestPOSwordsDF$PercentPositiveWords <- sapply(
                   strsplit(TestPOSwordsDF$Lyrics, " "), function(x) 
                   mean(x %in% PositiveWords$word) * 100)

Here we split Lyrics on space, get the ratio of words which are present in PositiveWords$word .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM