I have a data frame, where each column corresponds to patientID and each row corresponds to a particular gene value.
df <- data.frame(Hugo_Symbol=c("CDKN2A", "JUN", "IRS2","MTOR",
"NRAS"),
A183=c(-0.19,NA,2.01,0.4,1.23),
A185=c(0.11,2.45,NA,NA,1.67),
A186=c(1.19,NA,2.41,0.78,1.93),
A187=c(2.78,NA,NA,0.7,2.23),
A188=c(NA,NA,NA,2.4,1.23))
head(df)
Hugo_Symbol A183 A185 A186 A187 A188
1 CDKN2A -0.19 0.11 1.19 2.78 NA
2 JUN NA 2.45 NA NA NA
3 IRS2 2.01 NA 2.41 NA NA
4 MTOR 0.40 NA 0.78 0.70 2.40
5 NRAS 1.23 1.67 1.93 2.23 1.23
I would like to assign the following categories for each value:
I tried to use a cut
function to do that. My code looks like that:
df2<- df[cut(df,
breaks=c(-Inf,-2,2,Inf),
labels=c("1","2","3"))]
However, I received the following error:
Error in cut.default(df, breaks = c(-Inf, -2, 2, Inf), labels = c("1", : 'x' must be numeric
I believe it's because I have NA values in my table. I don't know how to assign the category "0" for NA values. The desired output should look like that:
Hugo_Symbol A183 A185 A186 A187 A188
1 CDKN2A 2 2 2 1 0
2 JUN 0 1 0 0 0
3 IRS2 1 0 1 0 0
4 MTOR 2 0 2 2 1
5 NRAS 2 2 2 1 2
How I can fix this error and replace each value with predefined category I have mentioned above?
Thank you for your help!
Olha
The code you have is correct but you need to apply it for each column. You can do it via lapply
in base R:
df[-1] <- lapply(df[-1], cut, c(-Inf,-2,2,Inf), c("1","2","3"))
df
# Hugo_Symbol A183 A185 A186 A187 A188
#1 CDKN2A 2 2 2 3 <NA>
#2 JUN <NA> 3 <NA> <NA> <NA>
#3 IRS2 3 <NA> 3 <NA> <NA>
#4 MTOR 2 <NA> 2 2 3
#5 NRAS 2 2 2 3 2
Or use across
in dplyr
:
library(dplyr)
df %>% mutate(across(starts_with('A'), cut, c(-Inf,-2,2,Inf),c("1","2","3")))
We can use findInterval
in base R
df[-1] <- lapply(df[-1], findInterval, c(-Inf, -2, 2, Inf))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.