如何从 R 中的多列创建合并值的新数据框

Question

I have a dataframe, df1, that looks like the following:我有一个数据框 df1，如下所示：

sample样本	99_Ape_1 99_猿_1	93_Cat_1 93_Cat_1	87_Ape_2 87_猿_2	84_Cat_2 84_Cat_2	90_Dog_1 90_狗_1	92_Dog_2 92_狗_2
A一种	2 2	3 3	1 1	7 7	4 4	6 6
B乙	5 5	9 9	7 7	0 0	3 3	7 7
C C	6 6	8 8	9 9	2 2	3 3	0 0
D D	3 3	9 9	0 0	5 5	8 8	3 3

I want to consolidate the dataframe by summing the values based on animal present in the header row, ie by "Ape", "Cat", "Dog", and end up with the following dataframe:我想通过对基于标题行中存在的动物（即“猿”、“猫”、“狗”）的值求和来合并数据帧，并最终得到以下数据帧：

sample样本	Ape猿	Cat猫	Dog狗
A一种	3 3	10 10	10 10
B乙	12 12	9 9	10 10
C C	15 15	10 10	3 3
D D	3 3	14 14	11 11

I have created a list that represents all the animals called "animals_list"我创建了一个列表，代表所有名为“animals_list”的动物

I have then created a list of dataframes that subsets each animal into a separate dataframe with:然后我创建了一个数据框列表，将每个动物分成一个单独的数据框：

animals_extract <- c()

for (i in 1:length(animals_list)){
  species_extract[[i]] <- df1[, grep(animals_list[i], names(df1))]
}

I am then trying to sum each variable in the row by sample:然后我试图按样本对行中的每个变量求和：

for (i in 1:length(species_extract)){
  species_extract[[i]]$total <- rowSums(species_extract[[i]])
}

and then create a dataframe 'animal_total' by binding all values in the new 'total' column.然后通过绑定新的“总计”列中的所有值来创建数据框“animal_total”。

animal_total <- NULL

for (i in 1:length(species_extract)){
  animal_total[i] <- cbind(species_extract[[i]]$total)
}

Unfortunately, this doesn't seem to work at all and I think I may have taken the wrong route.不幸的是，这似乎根本不起作用，我想我可能走错了路。 Any help would be really appreciated!任何帮助将非常感激！

EDIT: my dataframe has over 300 animals, meaning incorporating use of my list of identifiers (animals_list) would be highly appreciated!编辑：我的数据框有超过 300 只动物，这意味着合并使用我的标识符列表 (animals_list) 将不胜感激！ I would also note that some column names do not follow the structure, "number_animal_number" and therefore I can't use a repetitive search (sorry!).我还要注意一些列名不遵循结构“number_animal_number”，因此我不能使用重复搜索（对不起！）。

Answer 1

a data.table approach数据data.table方法

library(data.table)
library(rlist)
#set data to data.table format
setDT(df1)
# split column 2:n by regex on column names
L <- split.default(df1[,-1], gsub(".*_(.*)_.*", "\\1", names(df1)[-1]))
# Bind together again
data.table(sample = df1$sample, 
           as.data.table(list.cbind(lapply(L, rowSums))))
#    sample Ape Cat Dog
# 1:      A   3  10  10
# 2:      B  12   9  10
# 3:      C  15  10   3
# 4:      D   3  14  11

Answer 2

Update: After clarification: This may work depending on the other names of your animals.更新：澄清后：这可能会起作用，具体取决于您的动物的其他名称。 but this is a start:但这是一个开始：

library(dplyr)
library(tidyr)
df %>% 
  pivot_longer(
    cols = -sample
  ) %>% 
  mutate(name1 = str_extract(name, '(?<=\\_)(.*?)(?=\\_)')) %>% 
  group_by(sample, name1) %>% 
  summarise(sum=sum(value)) %>% 
  pivot_wider(
    names_from = name1,
    values_from= sum
  )

Output:输出：

  sample   Ape   Cat   Dog
  <chr>  <int> <int> <int>
1 A          3    10    10
2 B         12     9    10
3 C         15    10     3
4 D          3    14    11

First answer: Here is how we could do it with dplyr :第一个答案：这是我们如何使用dplyr做到这dplyr ：

library(dplyr)

df %>% 
  mutate(Cat = rowSums(select(., contains("Cat"))),
         Ape = rowSums(select(., contains("Ape"))),
         Dog = rowSums(select(., contains("Dog")))) %>% 
  select(sample, Cat, Ape, Dog)

  sample   Ape   Cat   Dog
  <chr>  <int> <int> <int>
1 A          3    10    10
2 B         12     9    10
3 C         15    10     3
4 D          3    14    11

Answer 3

An alternative data.table solution另一种 data.table 解决方案

library(data.table)

# Construct data table 
dt <- as.data.table(list(sample = c("A", "B", "C", "D"), 
                         `99_Ape_1` = c(2, 5, 6, 3), 
                         `93_Cat_1` = c(3, 9, 8, 9), 
                         `87_Ape_2` = c(1, 7, 9, 0),
                         `84_Cat_2` = c(7, 0, 2, 5),
                         `90_Dog_1` = c(4, 3, 3, 8),
                         `92_Dog_2` = c(6, 7, 0, 3)))

# Alternatively convert existing dataframe
# dt <- setDT(df)

# Use Regex pattern to drop ids from column names
names(dt) <- gsub("((^[0-9_]{3})|(_[0-9]{1}$))", "", names(dt))

# Pivot long (columns to rows)
dt <- melt(dt, id.vars = "sample")

# Aggregate sample by variable
dt <- dt[, .(value=sum(value)), by=.(sample, variable)]

# Unpivot (rows to colums)
dcast(dt, sample ~ variable)

#     sample Ape Cat Dog
# 1:      A   3  10  10
# 2:      B  12   9  10
# 3:      C  15  10   3
# 4:      D   3  14  11

Alternatively, leaving the column names as is (after comment from OP to previous answer) and assuming that there are multiple observations of the same samples:或者，保留列名（在从 OP 评论到上一个答案之后）并假设对相同样本有多个观察：

dt <- as.data.table(list(sample = c("A", "B", "C", "D", "A"), 
                         `99_Ape_1` = c(2, 5, 6, 3, 1), 
                         `93_Cat_1` = c(3, 9, 8, 9, 1), 
                         `87_Ape_2` = c(1, 7, 9, 0, 1),
                         `84_Cat_2` = c(7, 0, 2, 5, 1),
                         `90_Dog_1` = c(4, 3, 3, 8, 1),
                         `92_Dog_2` = c(6, 7, 0, 3, 1)))

dt

#     sample 99_Ape_1 93_Cat_1 87_Ape_2 84_Cat_2 90_Dog_1 92_Dog_2
# 1:      A        2        3        1        7        4        6
# 2:      B        5        9        7        0        3        7
# 3:      C        6        8        9        2        3        0
# 4:      D        3        9        0        5        8        3
# 5:      A        1        1        1        1        1        1

# Pivot long (columns to rows)
dt <- melt(dt, id.vars = "sample")

# Aggregate sample by variable
dt <- dt[, .(value=sum(value)), by=.(sample, variable)]

# Unpivot (rows to colums)
dcast(dt, sample ~ variable)

#     sample 99_Ape_1 93_Cat_1 87_Ape_2 84_Cat_2 90_Dog_1 92_Dog_2
# 1:      A        3        4        2        8        5        7
# 2:      B        5        9        7        0        3        7
# 3:      C        6        8        9        2        3        0
# 4:      D        3        9        0        5        8        3

如何从 R 中的多列创建合并值的新数据框

问题描述

3 个解决方案

解决方案1
4 2021-10-29 11:35:24

解决方案2
3 2021-10-29 11:41:34

解决方案3
0 2021-10-29 11:52:01

如何从 R 中的多列创建合并值的新数据框

问题描述

3 个解决方案

解决方案1 4 2021-10-29 11:35:24

解决方案2 3 2021-10-29 11:41:34

解决方案3 0 2021-10-29 11:52:01

解决方案1
4 2021-10-29 11:35:24

解决方案2
3 2021-10-29 11:41:34

解决方案3
0 2021-10-29 11:52:01