简体   繁体   中英

Subtracting similar column names R

I have a dataframe with columns that have 'x1' and 'x1_fit' with the numbers going up to 5 in some cases.

date <- seq(as.Date('2019-11-04'), by = "days", length.out = 7)
x1 <- c(100,120,111,152,110,112,111)
x1_fit <- c(150,142,146,148,123,120,145)
x2 <- c(110,130,151,152,150,142,161)
x2_fit <- c(170,172,176,178,173,170,175)

df <- data.frame(date,x1,x1_fit,x2,x2_fit)

How can I do x1_fit - x1 and so on. The number of x's will change every time.

You can select those columns with regular expressions (surppose the columns are in appropriate order):

> df[, grep('^x\\d+_fit$', colnames(df))] - df[, grep('^x\\d+$', colnames(df))]
  x1_fit x2_fit
1     50     60
2     22     42
3     35     25
4     -4     26
5     13     23
6      8     28
7     34     14

If you want to assign the differences to the original df :

df[, paste0(grep('^x\\d+$', colnames(df), value = TRUE), '_diff')] <- 
    df[, grep('^x\\d+_fit$', colnames(df))] - df[, grep('^x\\d+$', colnames(df))]

# > df
#         date  x1 x1_fit  x2 x2_fit x1_diff x2_diff
# 1 2019-11-04 100    150 110    170      50      60
# 2 2019-11-05 120    142 130    172      22      42
# 3 2019-11-06 111    146 151    176      35      25
# 4 2019-11-07 152    148 152    178      -4      26
# 5 2019-11-08 110    123 150    173      13      23
# 6 2019-11-09 112    120 142    170       8      28
# 7 2019-11-10 111    145 161    175      34      14

In base R, you could loop over the unique column names and diff on the the fitted column using

> lapply(setNames(nm = unique(gsub("_.*", "", names(df)))), function(nm) {
    fit <- paste0(nm, "_fit")
    diff <- df[, nm] - df[, fit]
})
# $x1
# [1] -50 -22 -35   4 -13  -8 -34
# 
# $x2
# [1] -60 -42 -25 -26 -23 -28 -14

Here, I set the Date column as the row names and removed the column using

df <- data.frame(date,x1,x1_fit,x2,x2_fit)
row.names(df) <- df$date
df$date <- NULL

but you could just loop over the the column names without the Date column.

Solution from @mt1022 is straightforward, however since you have tagged this as dplyr , here is one approach following it where we convert the data to long format, subtract the corresponding values and get the data in wide format again.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -date) %>%
  mutate(name = sub('_.*', '', name)) %>%
  group_by(date, name) %>%
  summarise(diff = diff(value)) %>%
  pivot_wider(names_from = name, values_from = diff) %>%
  rename_at(-1, ~paste0(., "_diff")) %>%
  left_join(df, by = "date")

#  date       x1_diff x2_diff    x1 x1_fit    x2 x2_fit
#  <date>       <dbl>   <dbl> <dbl>  <dbl> <dbl>  <dbl>
#1 2019-11-04      50      60   100    150   110    170
#2 2019-11-05      22      42   120    142   130    172
#3 2019-11-06      35      25   111    146   151    176
#4 2019-11-07      -4      26   152    148   152    178
#5 2019-11-08      13      23   110    123   150    173
#6 2019-11-09       8      28   112    120   142    170
#7 2019-11-10      34      14   111    145   161    175

We can also do with a split in base R

out <- sapply(split.default(df[-1], sub("_.*", "", names(df)[-1])), 
         function(x) x[,2] - x[1])
df[sub("\\..*", "_diff", names(lst1))] <- out
df
#         date  x1 x1_fit  x2 x2_fit x1_diff x2_diff
#1 2019-11-04 100    150 110    170      50      60
#2 2019-11-05 120    142 130    172      22      42
#3 2019-11-06 111    146 151    176      35      25
#4 2019-11-07 152    148 152    178      -4      26
#5 2019-11-08 110    123 150    173      13      23
#6 2019-11-09 112    120 142    170       8      28
#7 2019-11-10 111    145 161    175      34      14

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM