简体   繁体   中英

Calculating the mean of a vector that is present in a data frame cell

I have a column (named A) in a data frame that contains natural numbers as well as vectors of natural numbers. For the cells in which there is a vector of natural numbers, I want to calculate the mean of that vector. The end result I then want to store in a new column, named B.

Currently, I tried to do the following:

Val <- unlist(lapply(str_split(data$A, ","),
                     function(x) mean(as.numeric(x), na.rm=TRUE)))
Val[length(Val)] <- mean(Val[-length(Val)], na.rm=TRUE)
data$B <- Val

However, this doesn't seem to work correctly. The function above does not provide me with the mean of the vector, and it returns NaN when the vector only has 2 elements in it. Below an example of what it looks like

在此处输入图片说明

Using eval/parse :

# example data
df1 <- read.table(text = "
A
1
2
3
2
3
c(1,2,4)
3
3
c(2,3)", header = TRUE, stringsAsFactors = FALSE)


df1$B <- sapply(df1$A, function(i) mean(eval(parse(text = i))))

df1
#          A        B
# 1        1 1.000000
# 2        2 2.000000
# 3        3 3.000000
# 4        2 2.000000
# 5        3 3.000000
# 6 c(1,2,4) 2.333333
# 7        3 3.000000
# 8        3 3.000000
# 9   c(2,3) 2.500000

If you have column A as text another way is to remove the extra characters from the column using gsub , split on comma and then take mean . Using @zx8754's data

sapply(strsplit(gsub('[c()]', '', df1$A), ","), function(x) mean(as.numeric(x)))
#[1] 1.000 2.000 3.000 2.000 3.000 2.333 3.000 3.000 2.500

To paraphrase your question, you have a column containing comma-separated numbers and you want to turn this into a column containing the means of those numbers?

# data frame containing character vector of numbers
df = data.frame(A=c("1", "3", "3,4,5", "1, 6"), stringsAsFactors = F)

# convert to list of character vectors
df$B = strsplit(df$A, ",")

# convert to numeric and calculate mean
df$mean = sapply(df$B, function(x) mean(as.numeric(x)))

The key to understanding this is that df$B in my example is a list inside a data frame.

This kind of example also works well with the tidyverse packages:

library(tidyverse)
df = tibble(A=c("1", "3", "3,4,5", "1, 6"))

df %>%
    mutate(B = str_split(A, ",")) %>%
    mutate(mean = map_dbl(B, function(x) mean(as.numeric(x))))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM