简体   繁体   English

R:根据其他列创建多个新列

[英]R: Create multiple new columns based upon other columns

Let's say I have a data-frame that looks like this 假设我有一个看起来像这样的数据框架

dd <- read.table(header = TRUE, text = "ID week1_t week1_a  week2_t week2_a
  1      12      22       17       4   
  1      15      32       18       5   
  1      24      12       29       6   
  2      45      11       19       8   
  2      23      33       20      10")

Is there a straightforward way to create a week1_d column, a week2_d column, and so on for every week, that is based on the difference between week1_t and week1_a? 是否有一种直接的方法来创建week1_d列,week2_d列,等等每周,这是基于week1_t和week1_a之间的差异? Or do I have to manually construct the "difference" columns? 或者我是否必须手动构建“差异”列?

Expected output looks like this: 预期输出如下:

dd <- read.table(header = TRUE, text = "ID week1_t week1_a  week2_t week2_a week1_d week2_d 
  1      12      22       17       4       10       -13                 
  1      15      32       18       5       17       -13   
  1      24      12       29       6       -12      -23 
  2      45      11       19       8       -34      -11
  2      23      33       20      10       10       -10      ")

In actuality, there are around 30 weeks, so I am trying to avoid manually doing this. 实际上,有大约30周,所以我试图避免手动这样做。 I was thinking a for loop the runs through each week, and grepping columns that match week+(index of loop). 我正在考虑for循环每周的运行,并且grepping匹配week +(循环索引)的列。 Is there a better way of doing this? 有没有更好的方法呢?

From a "tidy data" perspective, your problem is that you're encoding (multiple!) pieces of data in your column names: the week number and whatever the letter stands for. 从“整洁的数据”角度来看,您的问题是您在列名中编码(多个!)数据:周数和字母代表的数字。 I would convert to a long format where week is a column, define d = a - t , and (if necessary) convert back to wide format. 我会转换为长格式,其中week是一列,定义d = a - t ,并且(如果需要)转换回宽格式。 But probably I'd keep it in the long format because if there are any other operations you want to do they'll probably be easier to implement on the long data (more manipulation, modeling, plotting...). 但是我可能会把它保留为长格式,因为如果你想做任何其他操作,它们可能更容易在长数据上实现(更多的操作,建模,绘图......)。

library(tidyr)
library(dplyr)

long = dd %>% 
    mutate(real_id = 1:n()) %>%
    gather(key = key, value = value, starts_with("week")) %>%
    separate(key, into = c("week", "letter")) %>% 
    spread(key = letter, value = value) %>%
    mutate(d = a - t)

head(long)
#   ID real_id  week  a  t   d
# 1  1       1 week1 22 12  10
# 2  1       1 week2  4 17 -13
# 3  1       2 week1 32 15  17
# 4  1       2 week2  5 18 -13
# 5  1       3 week1 12 24 -12
# 6  1       3 week2  6 29 -23

wide = gather(long, key = letter, value = value, a, t, d) %>%
    mutate(key = paste(week, letter, sep = "_")) %>%
    select(-week, -letter) %>%
    spread(key = key, value = value)

wide
#   ID real_id week1_a week1_d week1_t week2_a week2_d week2_t
# 1  1       1      22      10      12       4     -13      17
# 2  1       2      32      17      15       5     -13      18
# 3  1       3      12     -12      24       6     -23      29
# 4  2       4      11     -34      45       8     -11      19
# 5  2       5      33      10      23      10     -10      20

We split the 'week' columns ( dd[-1] ) by the names of the dataset after removing the suffix with sub into a list , get the difference between the two columns and assign the list elements to create new columns in 'dd'. 我们split带有sub的后缀移除到list ,将'week'列( dd[-1] )除以数据集的names ,得到两列之间的差异并分配list元素以在'dd'中创建新列。

lst <-  lapply(split.default(dd[-1], 
           sub("_.*", "", names(dd)[-1])), function(x) x[2]-x[1])
dd[paste0("week_", seq_along(lst), "d")] <- lapply(lst, unlist, use.names=FALSE)
dd
#    ID week1_t week1_a week2_t week2_a week1_d week2_d
#1  1      12      22      17       4      10     -13
#2  1      15      32      18       5      17     -13
#3  1      24      12      29       6     -12     -23
#4  2      45      11      19       8     -34     -11
#5  2      23      33      20      10      10     -10

If the columns are alternating ie 'week1_t' followed by 'week1_a', then 'week2_t', followed by 'week2_a', etc. 如果列是交替的,即'week1_t'后跟'week1_a',则'week2_t',然后是'week2_a',等等。

Un1 <- unique(sub("_.*", "", names(dd)[-1]))
i1 <-  c(TRUE, FALSE)
dd[paste0(Un1, "_d")] <-  dd[-1][!i1]- dd[-1][i1]
dd
#  ID week1_t week1_a week2_t week2_a week1_d week2_d
#1  1      12      22      17       4      10     -13
#2  1      15      32      18       5      17     -13
#3  1      24      12      29       6     -12     -23
#4  2      45      11      19       8     -34     -11
#5  2      23      33      20      10      10     -10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM