基於列名模式的新列

Question

我有一個數據表。 我想創建一個新列，它等於這些列中值的函數，列名中有一個模式

library(data.table)
library(dplyr)

set.seed(1)
DT <- data.table(Client = LETTERS[1:5], 
   Apple_2012 = rpois(5,5),  Apple_2013 = rpois(5,5), Pear_2012 = rpois(5,5), 
   Pear_2013 = rpois(5,5), Orange_2012 = rpois(5,5), Orange_2013 = rpois(5,5))

例如，我想

DT <- DT[ ,Fruit_2012 := Apple_2012 + Pear_2012 + Orange_2012]

但我想通過識別“2012”模式來做到這一點。 像這樣的東西：

DT <- DT[ ,Fruit_2012 := sum(names(DT)[grep("2012", names(DT))]) ]

或者

DT <- DT %.%
  mutate(Fruit_2012 = sum(names(DT)[grep("2012", names(DT))]) )

但是這兩種方法都沒有結果。

# Error in sum(names(DT)[grep("2012", names(DT))]) : 
#  invalid 'type' (character) of argument

我嘗試過使用list 、 quote和with=FALSE ，但沒有更多的運氣。

Answer 1

set.seed(1)
df <- data.frame(
  Client = LETTERS[1:5], 
  Apple_2012 = rpois(5,5),
  Apple_2013 = rpois(5,5), 
  Pear_2012 = rpois(5,5), 
  Pear_2013 = rpois(5,5), 
  Orange_2012 = rpois(5,5), 
  Orange_2013 = rpois(5,5)
)

鑒於這些數據，我強烈建議您將其轉換為tidy form ，因為它將變量置於一致的基礎上：

library(reshape2)

dfm <- melt(df, id = "Client")

variables <- colsplit(dfm$variable, "_", c("fruit", "year"))
dfm$variable <- NULL
dfm$fruit <- variables$fruit
dfm$year <- as.numeric(variables$year)

head(dfm)
#>   Client value fruit year
#> 1      A     4 Apple 2012
#> 2      B     4 Apple 2012
#> 3      C     5 Apple 2012
#> 4      D     8 Apple 2012
#> 5      E     3 Apple 2012
#> 6      A     8 Apple 2013

然后很容易用 dplyr 或其他方式總結你想要的方式：

library(dplyr)

dfm %.% group_by(Client, year) %.% summarise(fruit = mean(value))
#> Source: local data frame [10 x 3]
#> Groups: Client
#> 
#>    Client year fruit
#> 1       A 2012 5.333
#> 2       A 2013 5.667
#> 3       B 2012 3.333
#> 4       B 2013 5.333
#> 5       C 2012 5.667
#> 6       C 2013 7.000
#> 7       D 2012 5.000
#> 8       D 2013 6.000
#> 9       E 2012 4.667
#> 10      E 2013 4.333

Answer 2

我通常在這些情況下使用Reduce ：

DT[, Fruit_2012 := Reduce('+', .SD), .SDcols = grep("2012", names(DT))]

#or
DT[, Fruit_2012_max := Reduce(pmax, .SD), .SDcols = grep("2012", names(DT))]

Answer 3

嘗試包含在選擇功能中。

 mutate(DT,fruits2012 = rowSums(DT %.% select(contains("2012"))))

它有點丑。 但它有效。

我希望 dplyr 包中有一個 .SD。 如果是這樣，代碼將是這樣的：

DT %.%
      select(contains("2012")) %.%
      mutate(fruits2012 = rowSums(.SD))

基於列名模式的新列

問題描述

3 個解決方案

解決方案1
1 2014-04-17 00:44:30

解決方案2
1 2014-04-17 15:18:43

解決方案3
0 2014-04-17 13:00:10

基於列名模式的新列

問題描述

3 個解決方案

解決方案1 1 2014-04-17 00:44:30

解決方案2 1 2014-04-17 15:18:43

解決方案3 0 2014-04-17 13:00:10

解決方案1
1 2014-04-17 00:44:30

解決方案2
1 2014-04-17 15:18:43

解決方案3
0 2014-04-17 13:00:10