[英]how to combine multiple columns with grep and sum the values in r
I have following dataframe in r 我在R中有以下数据框
Engine General Ladder.winch engine.phe subm.gear.box aux.engine pipeline.maintain pipeline pipe.line engine.mpd
1 12 22 2 4 2 4 5 6 7
and so on with more than 10000 rows. 等等,超过10000行。
Now,I want to combine columns and add values to reduce the columns into broader categories. 现在,我想合并列并添加值以将列减少为更广泛的类别。 eg Engine,engine.phe,aux.engine,engine.mpd
should be combined into Engine
category and all the values to be added. 例如Engine,engine.phe,aux.engine,engine.mpd
合并到Engine
类别中,并添加所有值。 likewise pipeline.maintain,pipeline,pipe.line
to be combined into Pipeline
And rest columns to be added under General
Category. 同样,将pipeline.maintain,pipeline,pipe.line
合并到Pipeline
和rest列中,将其添加到General
Category下。
Desired dataframe would be 所需的数据帧将是
Engine Pipeline General
12 15 38
How can I do it in r? 我如何在R中做到这一点?
Many ways in which you can do it, this is a more straight forward approach 您可以通过多种方式做到这一点,这是一种更直接的方法
# Example data.frame
dtf <- structure(list(Engine = c(1, 0, 1),
General = c(12, 3, 15), Ladder.winch = c(22, 28, 26),
engine.phe = c(2, 1, 0), subm.gear.box = c(4, 4, 10),
aux.engine = c(2, 3, 1), pipeline.maintain = c(4, 5, 1),
pipeline = c(5, 5, 2), pipe.line = c(6, 8, 2), engine.mpd = c(7, 8, 19)),
.Names = c("Engine", "General", "Ladder.winch", "engine.phe",
"subm.gear.box", "aux.engine", "pipeline.maintain",
"pipeline", "pipe.line", "engine.mpd"),
row.names = c(NA, -3L), class = "data.frame")
with(dtf, data.frame(Engine=Engine+engine.phe+aux.engine+engine.mpd,
Pipeline=pipeline.maintain+pipeline+pipe.line,
General=General+Ladder.winch+subm.gear.box))
# Engine Pipeline General
# 1 12 15 38
# 2 12 18 35
# 3 21 5 51
# a more generalized and 'greppy' solution
cnames <- tolower(colnames(dtf))
data.frame(Engine=rowSums(dtf[, grep("eng", cnames)]),
Pipeline=rowSums(dtf[, grep("pip", cnames)]),
General=rowSums(dtf[, !grepl("eng|pip", cnames)]))
It is mostly better to store you data in long format. 最好以长格式存储数据。 Therefore, my proposal would to approach your problem as below: 因此,我的建议将按以下方式解决您的问题:
1 - get your data in long format 1-以长格式获取数据
library(reshape2)
dfl <- melt(df)
2 - create 'engine' and 'pipeline'-vectors 2-创建“引擎”和“管道”向量
e_vec <- c("Engine","engine.phe","aux.engine","engine.mpd")
p_vec <- c("pipeline.maintain","pipeline","pipe.line")
3 - create a category column 3-创建类别列
dfl$newcat <- c("general","engine","pipeline")[1 + dfl$variable %in% e_vec + 2*(dfl$variable %in% p_vec)]
The result: 结果:
> dfl
variable value newcat
1 Engine 1 engine
2 General 12 general
3 Ladder.winch 22 general
4 engine.phe 2 engine
5 subm.gear.box 4 general
6 aux.engine 2 engine
7 pipeline.maintain 4 pipeline
8 pipeline 5 pipeline
9 pipe.line 6 pipeline
10 engine.mpd 7 engine
Now you can use aggregate
to get the final result: 现在,您可以使用aggregate
来获得最终结果:
> aggregate(value ~ newcat, dfl, sum)
newcat value
1 engine 12
2 general 38
3 pipeline 15
Here is an option by extracting the concerned words from the names
of the column, and using tapply
to get the sum
. 这是一种选择,方法是从列的names
中提取有关的单词,然后使用tapply
来获取sum
。 The str_extract_all
returns a list
('lst'). str_extract_all
返回一个list
(“ lst”)。 Replace those elements which are having zero length with 'GENERAL', Then, using a group by function ie tapply
, unlist
the dataset, and use the grouping variables ie replicated 'lst' and the row
of 'df1' get the sum
将长度为零的那些元素替换为'GENERAL',然后使用按功能分组,即tapply
, unlist
数据集,并使用分组变量,即复制的'lst'和'df1' row
获取sum
library(stringr)
lst <- str_extract_all(toupper(sub("(pipe)\\.", "\\1", names(df1))),
"ENGINE|PIPELINE|GENERAL")
lst[lengths(lst)==0] <- "GENERAL"
t(tapply(unlist(df1), list(unlist(lst)[col(df1)], row(df1)), FUN = sum))
# ENGINE GENERAL PIPELINE
#1 12 38 15
myfactors = ifelse(grepl("engine", names(df), ignore.case = TRUE), "Engine",
ifelse(grepl("pipe|pipeline", names(df), ignore.case = TRUE), "Pipeline",
"General"))
data.frame(lapply(split.default(df, myfactors), rowSums))
# Engine General Pipeline
#1 12 38 15
#2 12 35 18
#3 21 51 5
df
is the data from this answer df
是此答案的数据
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.