In this data set, two Taxa (in Rows) contribute little to the overall data and i would like to gather all these rows, whose rowsums are less than n% of the entire dataset. n could be 1, 2, 3...
df <- data.frame(A=c(1000,100,1,0), B=c(100,1000,1,1), C=c(10,900,0,1))
row.names(df) <- c("Tax1", "Tax2", "Tax3", "Tax4")
> df
A B C
Tax1 1000 100 10
Tax2 100 1000 900
Tax3 1 1 0
Tax4 0 1 1
After identifying these low sum rows, i would like to bin them to eg "Other":
> df
A B C
Tax1 1000 100 10
Tax2 100 1000 900
Other 1 2 1
Thank you!
#Set n
n <- 0.1 #10%
#Calculate proportions of their row sums
rows <- prop.table(rowSums(df)) < n
#combine the rows and add a new row with 'Other'
rbind(df[!rows, ], Other = colSums(df[rows, ]))
# A B C
#Tax1 1000 100 10
#Tax2 100 1000 900
#Other 1 2 1
A tidyverse
/ dplyr
approach using a couple of tibble
functions
df <- data.frame(A=c(1000,100,1,0), B=c(100,1000,1,1), C=c(10,900,0,1))
row.names(df) <- c("Tax1", "Tax2", "Tax3", "Tax4")
library(tidyverse)
N <- 0.05 # 5 per cent
df %>% rownames_to_column('row') %>%
filter(rowSums(cur_data()[-1]) >= N * sum(cur_data()[-1])) %>%
bind_rows(df %>% rownames_to_column('row') %>%
filter(rowSums(cur_data()[-1]) < N * sum(cur_data()[-1])) %>%
summarise(across(-row, sum),
row = 'other')
) %>% column_to_rownames('row')
#> A B C
#> Tax1 1000 100 10
#> Tax2 100 1000 900
#> other 1 2 1
Created on 2021-06-04 by the reprex package (v2.0.0)
dplyr
only answer
df %>% filter(rowSums(cur_data()) >= N * sum(cur_data())) %>%
bind_rows(df %>%
filter(rowSums(cur_data()) < N * sum(cur_data())) %>%
summarise(across(everything(), sum)) %>% `row.names<-.data.frame`('Other')
)
A B C
Tax1 1000 100 10
Tax2 100 1000 900
Other 1 2 1
You can also use the following solution:
library(dplyr)
library(purrr)
library(tibble)
df %>%
filter(pmap_lgl(df, ~ sum(c(...)) >= 0.1 * sum(rowSums(df)))) %>%
rownames_to_column() %>%
bind_rows(df %>%
filter(pmap_lgl(df, ~ sum(c(...)) < 0.1 * sum(rowSums(df)))) %>%
summarise(across(A:C, ~ sum(.x)))) %>%
replace_na(list(rowname = "Other"))
rowname A B C
1 Tax1 1000 100 10
2 Tax2 100 1000 900
3 Other 1 2 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.