在 R 中的數據框中創建一個包含 1 列值總和的新行

Question

我有一個不同樣本中物種相對豐度 (%) 的多元數據集。 在這個數據框中，我只有最豐富的物種，所以總數不是 100%。

我的數據集看起來像這樣，但有更多的物種和樣本：

species = c("Species1","Species2","Species3","Species4","Species5","Species6")
Sample1 = c(0.6,7.9,7.1,2.7,4.5,6.4)
Sample2 = c(1.8,0.3,0.9,3.3,1.7,9.8)
Sample3 = c(9.2,1,8,2.1,8,2.2)
Sample4 = c(6.1,1.3,9,5.3,5.5,6.2)

df = data.frame(species, Sample1, Sample2, Sample3, Sample4)
df

   species Sample1 Sample2 Sample3 Sample4
1 Species1     0.6     1.8     9.2     6.1
2 Species2     7.9     0.3     1.0     1.3
3 Species3     7.1     0.9     8.0     9.0
4 Species4     2.7     3.3     2.1     5.3
5 Species5     4.5     1.7     8.0     5.5
6 Species6     6.4     9.8     2.2     6.2

但我想制作一個堆積條形圖，其中我還有變量“Others”代表所有最稀有物種的百分比覆蓋率，計算為100 - sum of column

我想要的結果是這樣的：

   species Sample1 Sample2 Sample3 Sample4
1 Species1     0.6     1.8     9.2     6.1
2 Species2     7.9     0.3     1.0     1.3
3 Species3     7.1     0.9     8.0     9.0
4 Species4     2.7     3.3     2.1     5.3
5 Species5     4.5     1.7     8.0     5.5
6 Species6     6.4     9.8     2.2     6.2
7   Others    70.8    82.2    69.5    66.6

我能怎么做？ 我一直在尋找幾個小時，但我找不到解決方案。

Answer 1

要獲取您需要的數據，請使用summarize(across())

bind_rows(
  df,
  df %>% summarize(across(starts_with("Sample"),~100-sum(.x))) %>% 
    mutate(species="Others")
)

輸出：

   species Sample1 Sample2 Sample3 Sample4
1 Species1     0.6     1.8     9.2     6.1
2 Species2     7.9     0.3     1.0     1.3
3 Species3     7.1     0.9     8.0     9.0
4 Species4     2.7     3.3     2.1     5.3
5 Species5     4.5     1.7     8.0     5.5
6 Species6     6.4     9.8     2.2     6.2
7   Others    70.8    82.2    69.5    66.6

此外，如果您想在一個簡單的堆積條形圖中繪制它，您可以繼續使用這個管道：

... %>% pivot_longer(cols = -species, names_to="Sample",values_to = "Abundance") %>% 
  ggplot(aes(Sample,Abundance,fill=species)) + 
  geom_col() + 
  labs(fill="", y="Relative Abundance")+
  theme(legend.position = "bottom")

Answer 2

順便說一句，這個和另一個答案的主要區別是使用data.table ：

library(data.table)
library(ggplot2)
library(RColorBrewer)
#
setDT(df)
result <- rbind(df, df[, c(species='Others', lapply(.SD, \(x) 100-sum(x))), .SDcols=-1])
result
##     species Sample1 Sample2 Sample3 Sample4
## 1: Species1     0.6     1.8     9.2     6.1
## 2: Species2     7.9     0.3     1.0     1.3
## 3: Species3     7.1     0.9     8.0     9.0
## 4: Species4     2.7     3.3     2.1     5.3
## 5: Species5     4.5     1.7     8.0     5.5
## 6: Species6     6.4     9.8     2.2     6.2
## 7:   Others    70.8    82.2    69.5    66.6

.SDcols = -1表示使用除第一列之外的所有列。

#   melt for use in ggplot
#   reorder factors to put "Others" at the top.
#
gg.dt <- melt(result, id='species')[
  , species:=factor(species, levels=c('Others', setdiff(unique(species), 'Others')))]
##
#   use Brewer palette, with grey80 for "Others"
#
ggplot(gg.dt, aes(x=variable, y=value, fill=species))+
  geom_bar(stat='identity', color='grey80')+
  scale_fill_manual(values = c('grey80', brewer.pal(6, 'Spectral')))+
  labs(x=NULL, y='Relative Abundance')

Answer 3

另一種可能的解決方案，基於dplyr和colSums ：

library(dplyr)

df %>% 
  bind_rows(data.frame(species = "Others", t(100 - colSums(.[-1]))))

#>    species Sample1 Sample2 Sample3 Sample4
#> 1 Species1     0.6     1.8     9.2     6.1
#> 2 Species2     7.9     0.3     1.0     1.3
#> 3 Species3     7.1     0.9     8.0     9.0
#> 4 Species4     2.7     3.3     2.1     5.3
#> 5 Species5     4.5     1.7     8.0     5.5
#> 6 Species6     6.4     9.8     2.2     6.2
#> 7   Others    70.8    82.2    69.5    66.6

在 R 中的數據框中創建一個包含 1 列值總和的新行

問題描述

3 個解決方案

解決方案1
2 2022-05-11 02:36:08

解決方案2
2 2022-05-11 03:25:09

解決方案3
1 已采納 2022-05-11 06:50:43

在 R 中的數據框中創建一個包含 1 列值總和的新行

問題描述

3 個解決方案

解決方案1 2 2022-05-11 02:36:08

解決方案2 2 2022-05-11 03:25:09

解決方案3 1 已采納 2022-05-11 06:50:43

解決方案1
2 2022-05-11 02:36:08

解決方案2
2 2022-05-11 03:25:09

解決方案3
1 已采納 2022-05-11 06:50:43