简体   繁体   English

在 R 中将两列连接为一列

[英]concatenate two columns into one in R

My data我的资料

conc_data=structure(list(kod_nar.id = c(1L, 3L, 2L), 
    x123_1 = c(0L, 0L, 0L), 
    x124_2 = c(0, 0.123, 0.122), 
    x125_3 = 0:2, 
    x126_4 = c(0, 0.234, 0.99)),
   .Names = c("kod_nar.id", "x123_1", "x124_2", "x125_3", "x126_4"), 
   class = "data.frame", row.names = c(NA, -3L))

There are 4 columns here, but every 2 columns need to be combined into one, with the name of the first column.这里有4列,但是每2列需要合并成一个,以第一列的名字命名。 In other words, each pair of columns should be combined into one with concatenating the values of digits.换句话说,每对列应该通过连接数字的值组合成一个。 As a result, we will have only 2 columns in the data frame.因此,我们将在数据框中只有 2 列。 Each column in data frame has a pair.The number of columns is even.the columns are in order the first pair, the second pair, and so on数据框中的每一列都有一对。列的数量是偶数。列的顺序是第一对,第二对,依此类推

IE Output IE 输出

  kod_nar.id   x123_1   x125_3
1          1        0        0
2          3 0(0.123) 1(0.234)
3          2 0(0.122)  2(0.99)

How to do it?怎么做?

Or:或者:

conc_data$x123_1 <- with(conc_data, ifelse(x124_2 == 0, "0", sprintf("%d(%.3f)", x123_1, x124_2)))
conc_data$x125_3 <- with(conc_data, ifelse(x126_4 == 0, "0", sprintf("%d(%.3f)", x125_3, x126_4)))

An option would be to loop over the sets of columns, use sprintf to format the columns of interest and cbind with the first column一个选项是循环列集,使用sprintf格式化感兴趣的列并使用第一列进行cbind

out <- cbind(conc_data[1], sapply(list(2:3, 4:5), 
          function(i) sprintf("%d(%f)", 
        round(conc_data[,i[1]], 2), conc_data[,i[2]])))

If the value for '0' needs to be zero如果“0”的值需要为零

out <- cbind(conc_data[1], sapply(list(2:3, 4:5), function(i) {
   dat <- conc_data[i]
   i1 <- !rowSums(dat != 0)
   v1 <- do.call(sprintf, c(fmt = "%d(%.3f)", dat))
   v1[i1] <- 0
    v1
    }))
names(out)[-1] <- names(conc_data)[c(2, 4)]
out
#  kod_nar.id   x123_1   x125_3
#1          1        0        0
#2          3 0(0.123) 1(0.234)
#3          2 0(0.122) 2(0.990)

Or more compactly或更紧凑

data.frame(c(conc_data[1], Map(sprintf, conc_data[c(2, 4)], 
        conc_data[c(3, 5)], MoreArgs = list(fmt = "%d(%.3f)"))))

We can split every two columns using split.default and use sapply to paste the two columns together in the format required.我们可以使用split.default拆分每两列,并使用sapply以所需的格式将两列paste在一起。 We add names to the output by selecting althernating column name.我们通过选择交替列名称将名称添加到输出中。

output <- cbind(conc_data[1], sapply(split.default(conc_data[-1], 
           rep(seq_along(conc_data), each = 2)[1:(ncol(conc_data) - 1)]), 
   function(x) paste0(x[[1]], "(", x[[2]], ")")))

names(output)[-1] <- names(conc_data)[-1][c(TRUE, FALSE)]

output
#  kod_nar.id   x123_1   x125_3
#1          1     0(0)     0(0)
#2          3 0(0.123) 1(0.234)
#3          2 0(0.122)  2(0.99)

Or maybe a bit simpler to split using gl或者使用gl拆分可能更简单一些

output <- cbind(conc_data[1], sapply(split.default(conc_data[-1],
 gl((ncol(conc_data) - 1)/2, 2)), 
   function(x) paste0(x[[1]], "(", x[[2]], ")")))

If you melt to long format you can do this with data.table group operations and then dcast back to wide如果你融化成长格式,你可以用 data.table 组操作来做到这一点,然后 dcast 回宽

df_long <- 
  melt(conc_data, 1)[
      , .(variable = variable[1],
          value = sprintf('%.0f(%.3f)', value[1], value[2]))
      , by = .(kod_nar.id, id = (rowid(kod_nar.id) - 1) %/% 2)]

out <- dcast(df_long, kod_nar.id ~ variable)

out
#    kod_nar.id   x123_1   x125_3
# 1:          1 0(0.000) 0(0.000)
# 2:          2 0(0.122) 2(0.990)
# 3:          3 0(0.123) 1(0.234)

If it's important to have just '0' on those first rows you could add this additional step如果在第一行中只有'0'很重要,您可以添加此附加步骤

out <- out[, lapply(.SD, function(x) ifelse(grepl('[1-9]', x), x, '0'))]

out
#    kod_nar.id   x123_1   x125_3
# 1:          1        0        0
# 2:          2 0(0.122) 2(0.990)
# 3:          3 0(0.123) 1(0.234)

Here's a tidyverse solution:这是一个tidyverse解决方案:

library(tidyverse)

conc_data %>%
 mutate(x123_1 = ifelse(x123_1 == x124_2, 
                         x123_1,
                         paste0(x123_1, "(", x124_2, ")")
                        ),
        x125_3 = ifelse(x125_3 == x126_4,
                        x125_3,
                        paste0(x125_3, "(", x126_4, ")")
                        )) %>%
 select(x123_1, x125_3)


    x123_1   x125_3
1        0        0
2 0(0.123) 1(0.234)
3 0(0.122)  2(0.99)

You can do this eg by using sapply and paste .例如,您可以通过使用sapplypaste来做到这一点。 I'm assuming to print only one number if the numbers are equal in both columns:如果两列中的数字相等,我假设只打印一个数字:

tt  <- seq(2,ncol(conc_data),2)
res  <- cbind(conc_data[1], sapply(tt, function(i) {
  ifelse(conc_data[,i] != conc_data[,i+1], paste0(conc_data[,i], "(", conc_data[,i+1],")") ,paste0(conc_data[,i]))
}
))
names(res)[-1]  <- names(conc_data)[s]
res
#  kod_nar.id   x123_1   x125_3
#1          1        0        0
#2          3 0(0.123) 1(0.234)
#3          2 0(0.122)  2(0.99)

Or by using the column name direct in sapply :或者直接在sapply使用列名:

tt  <- seq(2,ncol(conc_data),2)
cbind(conc_data[1], sapply(names(conc_data)[tt], function(i) {
  i2  <- which(names(conc_data) == i)+1
  ifelse(conc_data[,i] != conc_data[,i2], paste0(conc_data[,i], "(", conc_data[,i2],")") ,paste0(conc_data[,i]))
  }
))
#  kod_nar.id   x123_1   x125_3
#1          1        0        0
#2          3 0(0.123) 1(0.234)
#3          2 0(0.122)  2(0.99)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM