[英]How can I replace zeros with half the minimum value within a column?
I am tying to replace 0's in my dataframe of thousands of rows and columns with half the minimum value greater than zero from that column.我想用该列中大于零的最小值的一半替换数千行和列的数据框中的 0。 I would also not want to include the first four columns as they are indexes.我也不想包含前四列,因为它们是索引。
So if I start with something like this:所以如果我从这样的事情开始:
index <- c("100p", "200p", 300p" 400p")
ratio <- c(5, 4, 3, 2)
gene <- c("gapdh", NA, NA,"actb"
species <- c("mouse", NA, NA, "rat")
a1 <- c(0,3,5,2)
b1 <- c(0, 0, 4, 6)
c1 <- c(1, 2, 3, 4)
as.data.frame(q) <- cbind(index, ratio, gene, species, a1, b1, c1)
index ratio gene species a1 b1 c1
100p 5 gapdh mouse 0 0 1
200p 4 NA NA 3 0 2
300p 3 NA NA 5 4 3
400p 2 actb rat 2 6 4
I would hope to gain a result such as this:我希望得到这样的结果:
index ratio gene species a1 b1 c1
100p 5 gapdh mouse 1 2 1
200p 4 NA NA 3 2 2
300p 3 NA NA 5 4 3
400p 2 actb rat 2 6 4
I have tried the following code: apply(q[-4], 2, function(x) "[<-"(x, x==0, min(x[x > 0]) / 2))
我尝试了以下代码: apply(q[-4], 2, function(x) "[<-"(x, x==0, min(x[x > 0]) / 2))
but I keep getting the error: Error in min(x[x > 0])/2 : non-numeric argument to binary operator
但我不断收到错误: Error in min(x[x > 0])/2 : non-numeric argument to binary operator
Any help on this?这有什么帮助吗? Thank you very much!非常感谢!
We can use lapply
and replace
the 0 values with minimum value in column by 2.我们可以使用lapply
并将列中的最小值replace
为 2 的 0 值。
cols<- 5:7
q[cols] <- lapply(q[cols], function(x) replace(x, x == 0, min(x[x>0], na.rm = TRUE)/2))
q
# index ratio gene species a1 b1 c1
#1 100p 5 gapdh mouse 1 2 1
#2 200p 4 <NA> <NA> 3 2 2
#3 300p 3 <NA> <NA> 5 4 3
#4 400p 2 actb rat 2 6 4
In dplyr
, we can use mutate_at
在dplyr
,我们可以使用mutate_at
library(dplyr)
q %>% mutate_at(cols,~replace(., . == 0, min(.[.>0], na.rm = TRUE)/2))
data数据
q <- structure(list(index = structure(1:4, .Label = c("100p", "200p",
"300p", "400p"), class = "factor"), ratio = c(5, 4, 3, 2), gene = structure(c(2L,
NA, NA, 1L), .Label = c("actb", "gapdh"), class = "factor"),
species = structure(c(1L, NA, NA, 2L), .Label = c("mouse",
"rat"), class = "factor"), a1 = c(0, 3, 5, 2), b1 = c(0,
0, 4, 6), c1 = c(1, 2, 3, 4)), class = "data.frame", row.names = c(NA, -4L))
A slightly different (and potentially faster for large datasets) dplyr
option with a bit of maths could be:一个稍微不同的(对于大型数据集可能更快) dplyr
选项与一些数学可能是:
q %>%
mutate_at(vars(5:length(.)), ~ (. == 0) * min(.[. != 0])/2 + .)
index ratio gene species a1 b1 c1
1 100p 5 gapdh mouse 1 2 1
2 200p 4 <NA> <NA> 3 2 2
3 300p 3 <NA> <NA> 5 4 3
4 400p 2 actb rat 2 6 4
And the same with base R
:与base R
相同:
q[, 5:length(q)] <- lapply(q[, 5:length(q)], function(x) (x == 0) * min(x[x != 0])/2 + x)
For reference, considering your original code, I believe your function was not the issue.作为参考,考虑到您的原始代码,我相信您的功能不是问题。 Instead, the error comes from applying the function to non-numeric data.相反,错误来自将函数应用于非数字数据。
# original data
index <- c("100p", "200p", "300p" , "400p")
ratio <- c(5, 4, 3, 2)
gene <- c("gapdh", NA, NA,"actb")
species <- c("mouse", NA, NA, "rat")
a1 <- c(0,3,5,2)
b1 <- c(0, 0, 4, 6)
c1 <- c(1, 2, 3, 4)
# data frame
q <- as.data.frame(cbind(index, ratio, gene, species, a1, b1, c1))
# examine structure (all cols are factors)
str(q)
# convert factors to numeric
fac_to_num <- function(x){
x <- as.numeric(as.character(x))
x
}
# apply to cols 5 thru 7 only
q[, 5:7] <- apply(q[, 5:7],2,fac_to_num)
# examine structure
str(q)
# use original function only on numeric data
apply(q[, 5:7], 2, function(x) "[<-"(x, x==0, min(x[x > 0]) / 2))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.