簡體   English   中英

如何分隔列中的值並轉換為數值?

[英]How to separate values in a column and convert to numeric values?

我有一個數據集,其中的值已折疊,因此每一行的每一列都有多個輸入。

例如:

Gene   Score1                      
Gene1  NA, NA, NA, 0.03, -0.3 
Gene2  NA, 0.2, 0.1   

我正在嘗試將其解壓縮為Score1列每行的最大絕對值 select - 並且還通過創建新列來跟蹤最大絕對值之前是否為負數。

所以例子的output是:

Gene   Score1    Negatives1
Gene1   0.3          1
Gene1   0.2          0
#Score1 is now the maximum absolute value and if it used to be negative is tracked

我用這個編碼:

dat2 <- dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
  group_by(Gene) %>%
  #Create negative column to track max absolute values that were negative
  summarise(Negatives1 = +(min(Score1 == -max(abs(Score1))),
            Score1 = max(abs(Score1), na.rm = TRUE))

但是,由於某種原因,上面的代碼給了我這個錯誤:

Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.

我雖然通過使用convert = TRUE這將使值成為數字 - 但錯誤表明代碼在我運行separate_rows()后獲得非數字值?

輸入數據示例:

structure(list(Gene = c("Gene1", "Gene2"), Score1 = c("NA, NA, NA, 0.03, -0.3", 
"NA, 0.2, 0.1")), row.names = c(NA, -2L), class = c("data.table", 
"data.frame"))

如果我們查看separate_rows輸出,我認為問題變得很清楚:您的分隔列不是數字! 我想convert沒有把它撿起來。 我們可以使用as.numeric()強制轉換(並忽略警告 - 我們希望像" NA"這樣的東西變成NA )。

您在summarise中也有一些問題 - 需要更多na.rm = TRUE ,不匹配的括號等。

dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE)
# # A tibble: 8 x 2
#   Gene  Score1 
#   <chr> <chr>  
# 1 Gene1  NA    
# 2 Gene1 " NA"  
# 3 Gene1 " NA"  
# 4 Gene1 " 0.03"
# 5 Gene1 " -0.3"
# 6 Gene2  NA    
# 7 Gene2 " 0.2" 
# 8 Gene2 " 0.1" 

dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>% 
  mutate(Score1 = as.numeric(Score1)) %>% 
  group_by(Gene) %>%
  #Create negative column to track max absolute values that were negative
  summarise(
    Negatives1 = +(min(Score1, na.rm = TRUE) == -max(abs(Score1), na.rm = TRUE)),
    Score1 = max(abs(Score1), na.rm = TRUE)
  )
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 2 x 3
#   Gene  Negatives1 Score1
#   <chr>      <int>  <dbl>
# 1 Gene1          1    0.3
# 2 Gene2          0    0.2

這是一個data.table方法

library( matrixStats )
library( data.table)
#split strings
l <- data.table::tstrsplit( DT$Score1, ", " )l
#create value columns
DT[, paste0( "val_", 1:length( l ) ) := lapply( l, as.numeric ) ]
#funs max and negatives in the value columns
DT[, `:=`( Score1    = rowMaxs( as.matrix(.SD), na.rm = TRUE ),
           negatives = rowSums( .SD < 0, na.rm = TRUE ) ), 
   .SDcols = patterns("^val_")]
#get relevant columns
DT[, .(Gene, Score1, negatives) ]
# Gene Score1 negatives
# 1: Gene1   0.03         1
# 2: Gene2   0.20         0
Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.

那么這告訴你正在將非數字 arguments 擬合到數學 function 其中 max()

快速檢查我做了dat2[dat2$Gene == "Gene1",]給了我一個答案,你的一些數據由於分離而存儲為文本

  Gene  Score1
  <chr> <chr>  
1 Gene1  NA    
2 Gene1 " NA" 
3 Gene1 " NA" 
4 Gene1 " 0.03"
5 Gene1 " -0.3"

只需修改為數字:)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM