[英]Summarize specific word in data table in R
df <- data.frame(
"Domain" = c("Euka"),
"Kingdom" = c("An","Plan"),
"Division" = c("20181121","20181128","20181203"),
"Species" = c("20181115_AG25_MAGH_50_A05_CGT.TXT","20181122_AG25_MAGH_50_C05_CGT.ARR",
"20181115_AG25_MAGH_50_G05_CGT.TXT","20181124_AG25_MAGH_50_G45_CGT.TXT",
"20181204_AG25_MAGH_50_G05_CGT.ARR","20181205_AG25_MAGH_50_G45_CGT.TXT",
"20181207_AG25_MAGH_50_T05_CGT.ARR","20181215_AG25_MAGH_50_F45_CGT.TXT",
"20181223_AG25_MAGH_50_R07_CGT.GGI","20181225_TW77_MAGH_33_L06_CGT.ARR",
"20181226_TW77_MAGH_33_S07_CGT.ARR","20181227_TW77_MAGH_33_C06_CGT.TXT")
)
I want summarize that我要总结
Division![]() |
20181121 ![]() |
20181128 ![]() |
20181203 ![]() |
---|---|---|---|
Total_TXT ![]() |
2 ![]() |
0 ![]() |
3 ![]() |
Total_ARR ![]() |
2 ![]() |
3 ![]() |
0 ![]() |
Total_GGI ![]() |
0 ![]() |
0 ![]() |
1 ![]() |
How can I achieve this in R?如何在R中实现这一目标? Thanks.
谢谢。
Here is a tidyverse
option, where we use count
to get the total for each group, then we can put it into a wide format with pivot_wider
.这是一个
tidyverse
选项,我们使用count
来获取每个组的总数,然后我们可以使用pivot_wider
将其放入宽格式。
library(tidyverse)
df %>%
group_by(gr = Division) %>%
count(Division = str_replace_all(Species, '.*\\.', '')) %>%
pivot_wider(names_from = "gr", values_from = "n", values_fill = 0) %>%
mutate(Division = paste0("Total_", Division))
Output Output
Division `20181121` `20181128` `20181203`
<chr> <int> <int> <int>
1 Total_ARR 2 3 0
2 Total_TXT 2 1 3
3 Total_GGI 0 0 1
Or here is a data.table
option:或者这里有一个
data.table
选项:
library(data.table)
df <-
setDT(df)[, .N, by = .(cn = Division, Division = str_replace_all(Species, '.*\\.', ''))]
dcast(df,
paste0("Total_", Division) ~ cn,
value.var = "N",
fill = 0)
We need to extract the last three characters from Species:我们需要从物种中提取最后三个字符:
x <- nchar(df$Species)
rowlbl <- substr(df$Species, x-2, x)
table(rowlbl, df$Division)
# rowlbl 20181121 20181128 20181203
# ARR 2 3 0
# GGI 0 0 1
# TXT 2 1 3
Base R one-liner -基本ZE1E1D3D40573127EE9EE0480CAF1283D6Z ONE -LINER-
table(sub('.*\\.', '', df$Species), df$Division)
# 20181121 20181128 20181203
# ARR 2 3 0
# GGI 0 0 1
# TXT 2 1 3
Explanation:解释:
sub
removes everything until the last "."
returning返回
sub('.*\\.', '', df$Species)
#[1] "TXT" "ARR" "TXT" "TXT" "ARR" "TXT" "ARR" "TXT" "GGI" "ARR" "ARR" "TXT"
This is then used in table
with Division
values.
sub
can also be replaced with tools::file_ext
for a non-regex approach.对于非regex方法,也可以用
tools::file_ext
代替sub
。
table(tools::file_ext(df$Species), df$Division)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.