跨多个列的分组功能

Question

我试图通过一个系数找到多个列中的最小值，然后从原始数据框中减去该最小值。 所以说我有这个数据：

testdata <-  data.frame(
  category=factor(rep(c("a","j"),each=6,times=8)), 
  num1=(sample(0:15, 96, replace=TRUE)) + 5, 
  num2=(seq(1:96))
)

我正在寻找每个“类别”（a和j）的num1和num2列的最小值。 在现实生活中，我的因子变量更为复杂，并且具有大量的数字变量。

我能做的最好的事情是这样的：

test2 <- by(testdata, testdata[,"category"], function(x){
  y <- as.data.frame(apply(x[, c(2:3)], 2, min))
})

并将其重新组合在一起：

test3 <- do.call(rbind, lapply(test2, data.frame, stringsAsFactors=FALSE))

这似乎可行，但是我对如何按组减去该最小值有些困惑。 我想用sqldf完成的大致想法：

testdata4 <- sqldf("select a.category, 
                   a.num1-b.num1 as num1, 
                   a.num2-b.num2 as num2 
                   from testdata a left join testdata3 b 
                   on a.category = b.category")

尽管我不想指定每个新变量。 有什么想法吗？

Answer 1

使用tidyverse ：

library(tidyverse)
# Use set.seed(x) before generating data for future Q's to allow easy checks
#   of the desired output
set.seed(123)

testdata <-  data.frame(
    category=factor(rep(c("a","j"),each=6,times=8)), 
    num1=(sample(0:15, 96, replace=TRUE)) + 5, 
    num2=(seq(1:96))
)

# Generate those same minimums (note that you don't have to do this, just
# showing that you get the same results as your original code)
testdata %>%
    group_by(category) %>%
    summarize(num1 = min(num1), num2 = min(num2))

# Subtract them from the actual data
testdata %>%
    group_by(category) %>%
    mutate(num1_normed = num1 - min(num1),
           num2_normed = num2 - min(num2))

或者，如果您有很多列，并希望将其自动应用于所有列：

# Applies the function to all columns except 'category', the group_by column
testdata %>%
    group_by(category) %>%
    mutate_all(function(x) { x - min(x)})

Answer 2

以下是一些仅使用基数R的方法ave方法维护行的顺序。

1）通过 by问题中的用法使用by但带有sweep ：

Sweep <- function(x) cbind(x[1], sweep(x[-1], 2, apply(x[-1], 2, min), "-"))
do.call("rbind", by(testdata, testdata[[1]], Sweep))

2）除第一个列外，对其他列使用ave ，使用x-min(x)给出列L的列表，然后，由于ave保持顺序，在第二行中用其修改内容替换原始列。

L <- lapply(testdata[-1], function(x) ave(x, testdata[[1]], FUN = function(x) x - min(x)))
replace(tesdata, -1, L)

跨多个列的分组功能

问题描述

2 个解决方案

解决方案1
1 2017-05-17 01:33:27

解决方案2
1 已采纳 2017-05-17 12:26:07

跨多个列的分组功能

问题描述

2 个解决方案

解决方案1 1 2017-05-17 01:33:27

解决方案2 1 已采纳 2017-05-17 12:26:07

解决方案1
1 2017-05-17 01:33:27

解决方案2
1 已采纳 2017-05-17 12:26:07