如何將多個列與grep合並並求和r

Question

我在R中有以下數據框

Engine   General   Ladder.winch   engine.phe   subm.gear.box   aux.engine   pipeline.maintain    pipeline    pipe.line    engine.mpd
 1        12        22             2            4               2             4                    5            6             7

等等，超過10000行。

現在，我想合並列並添加值以將列減少為更廣泛的類別。 例如Engine,engine.phe,aux.engine,engine.mpd合並到Engine類別中，並添加所有值。 同樣，將pipeline.maintain,pipeline,pipe.line合並到Pipeline和rest列中，將其添加到General Category下。

所需的數據幀將是

 Engine      Pipeline       General
   12          15             38

我如何在R中做到這一點？

Answer 1

您可以通過多種方式做到這一點，這是一種更直接的方法

# Example data.frame
dtf <- structure(list(Engine = c(1, 0, 1), 
   General = c(12, 3, 15), Ladder.winch = c(22, 28, 26), 
    engine.phe = c(2, 1, 0), subm.gear.box = c(4, 4, 10), 
    aux.engine = c(2, 3, 1), pipeline.maintain = c(4, 5, 1), 
    pipeline = c(5, 5, 2), pipe.line = c(6, 8, 2), engine.mpd = c(7, 8, 19)),
    .Names = c("Engine", "General", "Ladder.winch", "engine.phe", 
      "subm.gear.box", "aux.engine", "pipeline.maintain", 
      "pipeline", "pipe.line", "engine.mpd"), 
    row.names = c(NA, -3L), class = "data.frame")

with(dtf, data.frame(Engine=Engine+engine.phe+aux.engine+engine.mpd,
                   Pipeline=pipeline.maintain+pipeline+pipe.line,
                    General=General+Ladder.winch+subm.gear.box))

#   Engine Pipeline General
# 1     12       15      38
# 2     12       18      35
# 3     21        5      51

# a more generalized and 'greppy' solution
cnames <- tolower(colnames(dtf))
data.frame(Engine=rowSums(dtf[, grep("eng", cnames)]),
         Pipeline=rowSums(dtf[, grep("pip", cnames)]),
          General=rowSums(dtf[, !grepl("eng|pip", cnames)]))

Answer 2

最好以長格式存儲數據。 因此，我的建議將按以下方式解決您的問題：

1-以長格式獲取數據

library(reshape2)
dfl <- melt(df)

2-創建“引擎”和“管道”向量

e_vec <- c("Engine","engine.phe","aux.engine","engine.mpd")
p_vec <- c("pipeline.maintain","pipeline","pipe.line")

3-創建類別列

dfl$newcat <- c("general","engine","pipeline")[1 + dfl$variable %in% e_vec + 2*(dfl$variable %in% p_vec)]

結果：

> dfl
            variable value   newcat
1             Engine     1   engine
2            General    12  general
3       Ladder.winch    22  general
4         engine.phe     2   engine
5      subm.gear.box     4  general
6         aux.engine     2   engine
7  pipeline.maintain     4 pipeline
8           pipeline     5 pipeline
9          pipe.line     6 pipeline
10        engine.mpd     7   engine

現在，您可以使用aggregate來獲得最終結果：

> aggregate(value ~ newcat, dfl, sum)
    newcat value
1   engine    12
2  general    38
3 pipeline    15

Answer 3

這是一種選擇，方法是從列的names中提取有關的單詞，然后使用tapply來獲取sum 。 str_extract_all返回一個list （“ lst”）。 將長度為零的那些元素替換為'GENERAL'，然后使用按功能分組，即tapply ， unlist數據集，並使用分組變量，即復制的'lst'和'df1' row獲取sum

library(stringr)
lst <- str_extract_all(toupper(sub("(pipe)\\.", "\\1", names(df1))),
          "ENGINE|PIPELINE|GENERAL")
lst[lengths(lst)==0] <- "GENERAL"
t(tapply(unlist(df1), list(unlist(lst)[col(df1)], row(df1)), FUN = sum))
#   ENGINE  GENERAL PIPELINE 
#1      12       38       15

Answer 4

myfactors = ifelse(grepl("engine", names(df), ignore.case = TRUE), "Engine",
                   ifelse(grepl("pipe|pipeline", names(df), ignore.case = TRUE), "Pipeline",
                          "General"))
data.frame(lapply(split.default(df, myfactors), rowSums))
#  Engine General Pipeline
#1     12      38       15
#2     12      35       18
#3     21      51        5

df是此答案的數據

如何將多個列與grep合並並求和r

問題描述

4 個解決方案

解決方案1
2 2017-10-01 11:48:37

解決方案2
1 2017-10-01 11:48:45

解決方案3
1 已采納 2017-10-01 12:14:30

解決方案4
1 2017-10-01 14:04:03

如何將多個列與grep合並並求和r

問題描述

4 個解決方案

解決方案1 2 2017-10-01 11:48:37

解決方案2 1 2017-10-01 11:48:45

解決方案3 1 已采納 2017-10-01 12:14:30

解決方案4 1 2017-10-01 14:04:03

解決方案1
2 2017-10-01 11:48:37

解決方案2
1 2017-10-01 11:48:45

解決方案3
1 已采納 2017-10-01 12:14:30

解決方案4
1 2017-10-01 14:04:03