![](/img/trans.png)
[英]efficient way to create a new variable from multiple columns in R dataframe
[英]How to a create a new dataframe of consolidated values from multiple columns in R
我有一个数据框 df1,如下所示:
样本 | 99_猿_1 | 93_Cat_1 | 87_猿_2 | 84_Cat_2 | 90_狗_1 | 92_狗_2 |
---|---|---|---|---|---|---|
一种 | 2 | 3 | 1 | 7 | 4 | 6 |
乙 | 5 | 9 | 7 | 0 | 3 | 7 |
C | 6 | 8 | 9 | 2 | 3 | 0 |
D | 3 | 9 | 0 | 5 | 8 | 3 |
我想通过对基于标题行中存在的动物(即“猿”、“猫”、“狗”)的值求和来合并数据帧,并最终得到以下数据帧:
样本 | 猿 | 猫 | 狗 |
---|---|---|---|
一种 | 3 | 10 | 10 |
乙 | 12 | 9 | 10 |
C | 15 | 10 | 3 |
D | 3 | 14 | 11 |
我创建了一个列表,代表所有名为“animals_list”的动物
然后我创建了一个数据框列表,将每个动物分成一个单独的数据框:
animals_extract <- c()
for (i in 1:length(animals_list)){
species_extract[[i]] <- df1[, grep(animals_list[i], names(df1))]
}
然后我试图按样本对行中的每个变量求和:
for (i in 1:length(species_extract)){
species_extract[[i]]$total <- rowSums(species_extract[[i]])
}
然后通过绑定新的“总计”列中的所有值来创建数据框“animal_total”。
animal_total <- NULL
for (i in 1:length(species_extract)){
animal_total[i] <- cbind(species_extract[[i]]$total)
}
不幸的是,这似乎根本不起作用,我想我可能走错了路。 任何帮助将非常感激!
编辑:我的数据框有超过 300 只动物,这意味着合并使用我的标识符列表 (animals_list) 将不胜感激! 我还要注意一些列名不遵循结构“number_animal_number”,因此我不能使用重复搜索(对不起!)。
数据data.table
方法
library(data.table)
library(rlist)
#set data to data.table format
setDT(df1)
# split column 2:n by regex on column names
L <- split.default(df1[,-1], gsub(".*_(.*)_.*", "\\1", names(df1)[-1]))
# Bind together again
data.table(sample = df1$sample,
as.data.table(list.cbind(lapply(L, rowSums))))
# sample Ape Cat Dog
# 1: A 3 10 10
# 2: B 12 9 10
# 3: C 15 10 3
# 4: D 3 14 11
更新:澄清后:这可能会起作用,具体取决于您的动物的其他名称。 但这是一个开始:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(
cols = -sample
) %>%
mutate(name1 = str_extract(name, '(?<=\\_)(.*?)(?=\\_)')) %>%
group_by(sample, name1) %>%
summarise(sum=sum(value)) %>%
pivot_wider(
names_from = name1,
values_from= sum
)
输出:
sample Ape Cat Dog
<chr> <int> <int> <int>
1 A 3 10 10
2 B 12 9 10
3 C 15 10 3
4 D 3 14 11
第一个答案:这是我们如何使用dplyr
做到这dplyr
:
library(dplyr)
df %>%
mutate(Cat = rowSums(select(., contains("Cat"))),
Ape = rowSums(select(., contains("Ape"))),
Dog = rowSums(select(., contains("Dog")))) %>%
select(sample, Cat, Ape, Dog)
sample Ape Cat Dog
<chr> <int> <int> <int>
1 A 3 10 10
2 B 12 9 10
3 C 15 10 3
4 D 3 14 11
另一种 data.table 解决方案
library(data.table)
# Construct data table
dt <- as.data.table(list(sample = c("A", "B", "C", "D"),
`99_Ape_1` = c(2, 5, 6, 3),
`93_Cat_1` = c(3, 9, 8, 9),
`87_Ape_2` = c(1, 7, 9, 0),
`84_Cat_2` = c(7, 0, 2, 5),
`90_Dog_1` = c(4, 3, 3, 8),
`92_Dog_2` = c(6, 7, 0, 3)))
# Alternatively convert existing dataframe
# dt <- setDT(df)
# Use Regex pattern to drop ids from column names
names(dt) <- gsub("((^[0-9_]{3})|(_[0-9]{1}$))", "", names(dt))
# Pivot long (columns to rows)
dt <- melt(dt, id.vars = "sample")
# Aggregate sample by variable
dt <- dt[, .(value=sum(value)), by=.(sample, variable)]
# Unpivot (rows to colums)
dcast(dt, sample ~ variable)
# sample Ape Cat Dog
# 1: A 3 10 10
# 2: B 12 9 10
# 3: C 15 10 3
# 4: D 3 14 11
或者,保留列名(在从 OP 评论到上一个答案之后)并假设对相同样本有多个观察:
dt <- as.data.table(list(sample = c("A", "B", "C", "D", "A"),
`99_Ape_1` = c(2, 5, 6, 3, 1),
`93_Cat_1` = c(3, 9, 8, 9, 1),
`87_Ape_2` = c(1, 7, 9, 0, 1),
`84_Cat_2` = c(7, 0, 2, 5, 1),
`90_Dog_1` = c(4, 3, 3, 8, 1),
`92_Dog_2` = c(6, 7, 0, 3, 1)))
dt
# sample 99_Ape_1 93_Cat_1 87_Ape_2 84_Cat_2 90_Dog_1 92_Dog_2
# 1: A 2 3 1 7 4 6
# 2: B 5 9 7 0 3 7
# 3: C 6 8 9 2 3 0
# 4: D 3 9 0 5 8 3
# 5: A 1 1 1 1 1 1
# Pivot long (columns to rows)
dt <- melt(dt, id.vars = "sample")
# Aggregate sample by variable
dt <- dt[, .(value=sum(value)), by=.(sample, variable)]
# Unpivot (rows to colums)
dcast(dt, sample ~ variable)
# sample 99_Ape_1 93_Cat_1 87_Ape_2 84_Cat_2 90_Dog_1 92_Dog_2
# 1: A 3 4 2 8 5 7
# 2: B 5 9 7 0 3 7
# 3: C 6 8 9 2 3 0
# 4: D 3 9 0 5 8 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.