I have a data frame that has a number of variables in it that I want to concatenate into new variables in that same data frame. A simplified version of my data frame df looks like this:
first.1 second.1 first.2 second.2
1222 3223 3333 1221
1111 2212 2232 2113
Here is how I do it inefficiently without a for loop:
df$concatenated.1 <- paste0(df$first.1,"-",df$second.1)
df$concatenated.2 <- paste0(df$first.2,"-",df$second.2)
Which results in the following data frame df :
first.1 second.1 first.2 second.2 concatenated.1 concatenated.2
1222 3223 3333 1221 1222-3223 3333-1221
1111 2212 2232 2113 1111-2212 2232-2113
I have a lot more than 2 pairs of variables to concatenate, so I would like to do this in a for loop:
for (i in 1:2){
??
}
Any ideas on how to accomplish this?
If you could figure out a way to split your columns then it would be much easier. For example, based on provided example we can split columns based on last characters of column names (1, 1, 2, 2).
Using base R we use split.default
to split the columns based on names (as described above) and for every group we paste
each row and add new columns.
group_names <- substring(names(df), nchar(names(df)))
df[paste0("concatenated.", unique(group_names))] <-
lapply(split.default(df,group_names), function(x) do.call(paste, c(x, sep = "-")))
df
# first.1 second.1 first.2 second.2 concatenated.1 concatenated.2
#1 1222 3223 3333 1221 1222-3223 3333-1221
#2 1111 2212 2232 2113 1111-2212 2232-2113
If your real data has names which follow a clear pattern as in this example data, Ronak's split
/ lapply
answer is probably best. If not, you can just create vectors of the names and use Map
with paste
.
new.names <- paste0('concatenated.', 1:2)
names.1 <- paste0('first.', 1:2)
names.2 <- paste0('second.', 1:2)
df[new.names] <- Map(paste, df[names.1], df[names.2], sep = '-')
df
# first.1 second.1 first.2 second.2 concatenated.1 concatenated.2
# 1 1222 3223 3333 1221 1222-3223 3333-1221
# 2 1111 2212 2232 2113 1111-2212 2232-2113
Here's a tidyverse solution that gets you most of the way there. The only difference is that the columns are output alphabetically, ie, the "firsts", then the "concatenated"s, then the "seconds".
txt <- 'first.1 second.1 first.2 second.2
1222 3223 3333 1221
1111 2212 2232 2113'
df <- read.table(text = txt, header = T)
library(tidyverse)
df2 <- df %>%
mutate(row.num = row_number()) %>%
gather(variable, value, -row.num) %>%
separate(variable, into = c('order', 'pair')) %>%
spread(order, value) %>%
mutate(concatenated = paste0(first, '-', second)) %>%
gather(variable, value, -row.num, -pair) %>%
unite(name, variable, pair) %>%
spread(name, value)
row.num concatenated_1 concatenated_2 first_1 first_2 second_1 second_2
1 1 1222-3223 3333-1221 1222 3333 3223 1221
2 2 1111-2212 2232-2113 1111 2232 2212 2113
library(tidyverse)
[EDITED: original solution incorrectly used starts_with
]
This solution uses ends_with()
to select the appropriate columns, then unite
to combine them with a -
seperator:
df <- tribble(
~first.1, ~second.1, ~first.2, ~second.2,
1222,3223,3333,1221,
1111,2212,2232,2113)
df1 <- df %>%
select(ends_with("1")) %>%
unite(concatenated.1, sep = "-")
df2 <- df %>%
select(ends_with("2")) %>%
unite(concatenated.2, sep = "-")
cbind(df, df1, df2)
you can use the function stri_join
in the stringi package, which is very fast.
library(data.table)
library(stringi)
df <- fread("first.1 second.1 first.2 second.2
1222 3223 3333 1221
1111 2212 2232 2113")
cols <- paste0("concatenated_", 1:2)
df[, (cols) := Map(stri_join, .(first.1, first.2), .(second.1, second.2), sep = "-")]
setDF(df)
first.1 second.1 first.2 second.2 concatenated_1 concatenated_2
1 1222 3223 3333 1221 1222-3223 3333-1221
2 1111 2212 2232 2113 1111-2212 2232-2113
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.