簡體   English   中英

通過dplyr進行匯總-將單個列從因子突變為數值

[英]Aggregating via dplyr - mutating a single column from factor to numeric

嗨,謝謝您的閱讀。

我一直在嘗試聚合一些數據,並且已經能夠通過聚合函數成功地做到這一點,但是我也想通過使用dplyr運行管道來嘗試做同樣的事情-但是我一直收到錯誤消息:

mutate_impl(.data,點)中的錯誤:評估錯誤:找不到函數“ 15.2”。

我目前有此數據集p:

    sample    gene           ct
1    s001     gapdh         15.2
2    s001     gapdh           16
3    s001     gapdh         14.8
4    s002     gapdh         16.2
5    s002     gapdh           17
6    s002     gapdh         16.7
7    s003     gapdh Undetermined
8    s003     gapdh         14.6
9    s003     gapdh           15
10   s001      actb         24.5
11   s001      actb         24.2 
12   s001      actb         24.7
13   s002      actb           25
14   s002      actb         25.7
15   s002      actb         25.5
16   s003      actb         27.3
17   s003      actb         27.4
18   s003      actb Undetermined

並希望它達到:

  p2$sample p2$gene  p2$ct.mean    p2$ct.sd
1      s001    actb 24.46666667  0.25166115
2      s002    actb 25.40000000  0.36055513
3      s003    actb 27.35000000  0.07071068
4      s001   gapdh 15.33333333  0.61101009
5      s002   gapdh 16.63333333  0.40414519
6      s003   gapdh 14.80000000  0.28284271

我當前正在使用的代碼會導致上述錯誤:

library(dplyr)

p_ave_sd <- p %>% 
  filter(p$ct != "Undetermined") %>%
  mutate_at(as.character(p$ct), as.numeric, rm.na = TRUE) %>%
  group_by(p$gene) %>% 
  summarise(mean=mean(p$ct), sd=sd(p$ct))

這絕對是讓我絆倒的“變異”步驟,我已經嘗試過mutate_all(),mutate_if(is.factor,is.numeric)等,但是每個步驟都有其自身的錯誤。

謝謝您的幫助!

這是使用mutate_at的方法。 如果只有一欄要轉換,則mutate也可以工作,並且更直接。

library(dplyr)

dat2 <- dat %>%
  filter(!ct %in% "Undetermined") %>%
  # mutate(ct = as.numeric(ct)) %>% <<< This will also work
  mutate_at(vars(ct), funs(as.numeric(.))) %>%
  group_by(sample, gene) %>% 
  summarise(mean = mean(ct), sd = sd(ct)) %>%
  ungroup()

dat2
# # A tibble: 6 x 4
#   sample gene   mean     sd
#   <chr>  <chr> <dbl>  <dbl>
# 1 s001   actb   24.5 0.252 
# 2 s001   gapdh  15.3 0.611 
# 3 s002   actb   25.4 0.361 
# 4 s002   gapdh  16.6 0.404 
# 5 s003   actb   27.4 0.0707
# 6 s003   gapdh  14.8 0.283 

數據

dat <- read.table(text = "    sample    gene           ct
1    s001     gapdh         15.2
                  2    s001     gapdh           16
                  3    s001     gapdh         14.8
                  4    s002     gapdh         16.2
                  5    s002     gapdh           17
                  6    s002     gapdh         16.7
                  7    s003     gapdh Undetermined
                  8    s003     gapdh         14.6
                  9    s003     gapdh           15
                  10   s001      actb         24.5
                  11   s001      actb         24.2 
                  12   s001      actb         24.7
                  13   s002      actb           25
                  14   s002      actb         25.7
                  15   s002      actb         25.5
                  16   s003      actb         27.3
                  17   s003      actb         27.4
                  18   s003      actb Undetermined",
                  header = TRUE, stringsAsFactors = FALSE)

我不確定是否理解您的問題,但是可能是:

p_ave_sd <- p %>% 
   filter(ct != "undetermined") %>%
   mutate(ct=as.numeric(ct)) %>%
   group_by(gene,sample) %>% 
   summarise(mean=mean(ct), sd=sd(ct))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM