[英]How to separate values in a column and convert to numeric values?
我有一個數據集,其中的值已折疊,因此每一行的每一列都有多個輸入。
例如:
Gene Score1
Gene1 NA, NA, NA, 0.03, -0.3
Gene2 NA, 0.2, 0.1
我正在嘗試將其解壓縮為Score1
列每行的最大絕對值 select - 並且還通過創建新列來跟蹤最大絕對值之前是否為負數。
所以例子的output是:
Gene Score1 Negatives1
Gene1 0.3 1
Gene1 0.2 0
#Score1 is now the maximum absolute value and if it used to be negative is tracked
我用這個編碼:
dat2 <- dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
group_by(Gene) %>%
#Create negative column to track max absolute values that were negative
summarise(Negatives1 = +(min(Score1 == -max(abs(Score1))),
Score1 = max(abs(Score1), na.rm = TRUE))
但是,由於某種原因,上面的代碼給了我這個錯誤:
Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.
我雖然通過使用convert = TRUE
這將使值成為數字 - 但錯誤表明代碼在我運行separate_rows()
后獲得非數字值?
輸入數據示例:
structure(list(Gene = c("Gene1", "Gene2"), Score1 = c("NA, NA, NA, 0.03, -0.3",
"NA, 0.2, 0.1")), row.names = c(NA, -2L), class = c("data.table",
"data.frame"))
如果我們查看separate_rows
輸出,我認為問題變得很清楚:您的分隔列不是數字! 我想convert
沒有把它撿起來。 我們可以使用as.numeric()
強制轉換(並忽略警告 - 我們希望像" NA"
這樣的東西變成NA
)。
您在summarise
中也有一些問題 - 需要更多na.rm = TRUE
,不匹配的括號等。
dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE)
# # A tibble: 8 x 2
# Gene Score1
# <chr> <chr>
# 1 Gene1 NA
# 2 Gene1 " NA"
# 3 Gene1 " NA"
# 4 Gene1 " 0.03"
# 5 Gene1 " -0.3"
# 6 Gene2 NA
# 7 Gene2 " 0.2"
# 8 Gene2 " 0.1"
dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
mutate(Score1 = as.numeric(Score1)) %>%
group_by(Gene) %>%
#Create negative column to track max absolute values that were negative
summarise(
Negatives1 = +(min(Score1, na.rm = TRUE) == -max(abs(Score1), na.rm = TRUE)),
Score1 = max(abs(Score1), na.rm = TRUE)
)
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 2 x 3
# Gene Negatives1 Score1
# <chr> <int> <dbl>
# 1 Gene1 1 0.3
# 2 Gene2 0 0.2
這是一個data.table
方法
library( matrixStats )
library( data.table)
#split strings
l <- data.table::tstrsplit( DT$Score1, ", " )l
#create value columns
DT[, paste0( "val_", 1:length( l ) ) := lapply( l, as.numeric ) ]
#funs max and negatives in the value columns
DT[, `:=`( Score1 = rowMaxs( as.matrix(.SD), na.rm = TRUE ),
negatives = rowSums( .SD < 0, na.rm = TRUE ) ),
.SDcols = patterns("^val_")]
#get relevant columns
DT[, .(Gene, Score1, negatives) ]
# Gene Score1 negatives
# 1: Gene1 0.03 1
# 2: Gene2 0.20 0
Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.
那么這告訴你正在將非數字 arguments 擬合到數學 function 其中 max()
快速檢查我做了dat2[dat2$Gene == "Gene1",]
給了我一個答案,你的一些數據由於分離而存儲為文本
Gene Score1
<chr> <chr>
1 Gene1 NA
2 Gene1 " NA"
3 Gene1 " NA"
4 Gene1 " 0.03"
5 Gene1 " -0.3"
只需修改為數字:)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.