I have a column (named A) in a data frame that contains natural numbers as well as vectors of natural numbers. For the cells in which there is a vector of natural numbers, I want to calculate the mean of that vector. The end result I then want to store in a new column, named B.
Currently, I tried to do the following:
Val <- unlist(lapply(str_split(data$A, ","),
function(x) mean(as.numeric(x), na.rm=TRUE)))
Val[length(Val)] <- mean(Val[-length(Val)], na.rm=TRUE)
data$B <- Val
However, this doesn't seem to work correctly. The function above does not provide me with the mean of the vector, and it returns NaN when the vector only has 2 elements in it. Below an example of what it looks like
Using eval/parse :
# example data
df1 <- read.table(text = "
A
1
2
3
2
3
c(1,2,4)
3
3
c(2,3)", header = TRUE, stringsAsFactors = FALSE)
df1$B <- sapply(df1$A, function(i) mean(eval(parse(text = i))))
df1
# A B
# 1 1 1.000000
# 2 2 2.000000
# 3 3 3.000000
# 4 2 2.000000
# 5 3 3.000000
# 6 c(1,2,4) 2.333333
# 7 3 3.000000
# 8 3 3.000000
# 9 c(2,3) 2.500000
If you have column A
as text another way is to remove the extra characters from the column using gsub
, split on comma and then take mean
. Using @zx8754's data
sapply(strsplit(gsub('[c()]', '', df1$A), ","), function(x) mean(as.numeric(x)))
#[1] 1.000 2.000 3.000 2.000 3.000 2.333 3.000 3.000 2.500
To paraphrase your question, you have a column containing comma-separated numbers and you want to turn this into a column containing the means of those numbers?
# data frame containing character vector of numbers
df = data.frame(A=c("1", "3", "3,4,5", "1, 6"), stringsAsFactors = F)
# convert to list of character vectors
df$B = strsplit(df$A, ",")
# convert to numeric and calculate mean
df$mean = sapply(df$B, function(x) mean(as.numeric(x)))
The key to understanding this is that df$B in my example is a list inside a data frame.
This kind of example also works well with the tidyverse packages:
library(tidyverse)
df = tibble(A=c("1", "3", "3,4,5", "1, 6"))
df %>%
mutate(B = str_split(A, ",")) %>%
mutate(mean = map_dbl(B, function(x) mean(as.numeric(x))))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.