简体   繁体   中英

Concatenate pairs of variables with same suffix

I have a data frame that has a number of variables in it that I want to concatenate into new variables in that same data frame. A simplified version of my data frame df looks like this:

first.1 second.1 first.2 second.2 
1222 3223 3333 1221 
1111 2212 2232 2113 

Here is how I do it inefficiently without a for loop:

df$concatenated.1 <- paste0(df$first.1,"-",df$second.1)
df$concatenated.2 <- paste0(df$first.2,"-",df$second.2)

Which results in the following data frame df :

first.1 second.1 first.2 second.2 concatenated.1 concatenated.2 
1222 3223 3333 1221 1222-3223 3333-1221 
1111 2212 2232 2113 1111-2212 2232-2113 

I have a lot more than 2 pairs of variables to concatenate, so I would like to do this in a for loop:

for (i in 1:2){
??
}

Any ideas on how to accomplish this?

If you could figure out a way to split your columns then it would be much easier. For example, based on provided example we can split columns based on last characters of column names (1, 1, 2, 2).

Using base R we use split.default to split the columns based on names (as described above) and for every group we paste each row and add new columns.

group_names <- substring(names(df), nchar(names(df)))
df[paste0("concatenated.", unique(group_names))] <- 
     lapply(split.default(df,group_names),  function(x)  do.call(paste, c(x, sep = "-")))

df
#  first.1 second.1 first.2 second.2 concatenated.1 concatenated.2
#1    1222     3223    3333     1221      1222-3223      3333-1221
#2    1111     2212    2232     2113      1111-2212      2232-2113

If your real data has names which follow a clear pattern as in this example data, Ronak's split / lapply answer is probably best. If not, you can just create vectors of the names and use Map with paste .

new.names <- paste0('concatenated.', 1:2)
names.1 <- paste0('first.', 1:2)
names.2 <- paste0('second.', 1:2)

df[new.names] <- Map(paste, df[names.1], df[names.2], sep = '-')

df

#   first.1 second.1 first.2 second.2 concatenated.1 concatenated.2
# 1    1222     3223    3333     1221      1222-3223      3333-1221
# 2    1111     2212    2232     2113      1111-2212      2232-2113

Here's a tidyverse solution that gets you most of the way there. The only difference is that the columns are output alphabetically, ie, the "firsts", then the "concatenated"s, then the "seconds".

txt <- 'first.1 second.1 first.2 second.2 
1222 3223 3333 1221 
1111 2212 2232 2113'

df <- read.table(text = txt, header = T)

library(tidyverse)

df2 <- df %>% 
  mutate(row.num = row_number()) %>% 
  gather(variable, value, -row.num) %>% 
  separate(variable, into = c('order', 'pair')) %>% 
  spread(order, value) %>% 
  mutate(concatenated = paste0(first, '-', second)) %>% 
  gather(variable, value, -row.num, -pair) %>% 
  unite(name, variable, pair) %>% 
  spread(name, value)

  row.num concatenated_1 concatenated_2 first_1 first_2 second_1 second_2
1       1      1222-3223      3333-1221    1222    3333     3223     1221
2       2      1111-2212      2232-2113    1111    2232     2212     2113

library(tidyverse)

[EDITED: original solution incorrectly used starts_with ]

This solution uses ends_with() to select the appropriate columns, then unite to combine them with a - seperator:

df <- tribble(
        ~first.1, ~second.1, ~first.2, ~second.2,
        1222,3223,3333,1221,
        1111,2212,2232,2113)

df1 <- df %>%
  select(ends_with("1")) %>%
  unite(concatenated.1, sep = "-")

df2 <- df %>%
  select(ends_with("2")) %>%
  unite(concatenated.2, sep = "-")

cbind(df, df1, df2)

you can use the function stri_join in the stringi package, which is very fast.

library(data.table)
library(stringi)

df <- fread("first.1 second.1 first.2 second.2 
             1222 3223 3333 1221 
             1111 2212 2232 2113")

cols <- paste0("concatenated_", 1:2)
df[, (cols) := Map(stri_join, .(first.1, first.2), .(second.1, second.2), sep = "-")]
setDF(df)

first.1 second.1 first.2 second.2 concatenated_1 concatenated_2
1    1222     3223    3333     1221      1222-3223      3333-1221
2    1111     2212    2232     2113      1111-2212      2232-2113

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM