简体   繁体   中英

how can I write R code to create a new column returning the most frequent item of a list column for every row in the dataframe

current code is shown below. I am trying to return the most common element of aa list (column 12 in the df ) in a new column ( df$newCol ) for every row of the dataframe. Column 12 in df , of type list is titled df$status_combined and has values that look like this: c("high", " medium", " medium")

for (index in 1:nrow(df)) {
  row = df[index, ]
  df$newCol <- names(sort(list.table(as.vector(df[row,12])), decreasing = TRUE))[1]
}

Error in xj[i] : invalid subscript type 'list'

I'm assuming from your description that df$status_combined is a list like "L" created below:

set.seed(1)
L <- replicate(5, sample(c("high", "medium", "low"), 10, TRUE), FALSE)

You're not far off in terms of your approach. I'd suggest a simple function like the following:

f <- function(x) names(sort(table(x), decreasing = TRUE))[1]

You can then get your result by simply doing:

sapply(L, f)
# [1] "medium" "low"    "high"   "low"    "low"  

If you wanted the tabulation of all values, you can try something like:

table(rev(stack(setNames(L, seq_along(L)))))
#    values
# ind high low medium
#   1    3   3      4
#   2    2   5      3
#   3    4   2      4
#   4    2   4      4 ## <~~ You'll have to think about ties
#   5    1   5      4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM