[英]concatenate two columns into one in R
My data我的资料
conc_data=structure(list(kod_nar.id = c(1L, 3L, 2L),
x123_1 = c(0L, 0L, 0L),
x124_2 = c(0, 0.123, 0.122),
x125_3 = 0:2,
x126_4 = c(0, 0.234, 0.99)),
.Names = c("kod_nar.id", "x123_1", "x124_2", "x125_3", "x126_4"),
class = "data.frame", row.names = c(NA, -3L))
There are 4 columns here, but every 2 columns need to be combined into one, with the name of the first column.这里有4列,但是每2列需要合并成一个,以第一列的名字命名。 In other words, each pair of columns should be combined into one with concatenating the values of digits.
换句话说,每对列应该通过连接数字的值组合成一个。 As a result, we will have only 2 columns in the data frame.
因此,我们将在数据框中只有 2 列。 Each column in data frame has a pair.The number of columns is even.the columns are in order the first pair, the second pair, and so on
数据框中的每一列都有一对。列的数量是偶数。列的顺序是第一对,第二对,依此类推
IE Output IE 输出
kod_nar.id x123_1 x125_3
1 1 0 0
2 3 0(0.123) 1(0.234)
3 2 0(0.122) 2(0.99)
How to do it?怎么做?
Or:或者:
conc_data$x123_1 <- with(conc_data, ifelse(x124_2 == 0, "0", sprintf("%d(%.3f)", x123_1, x124_2)))
conc_data$x125_3 <- with(conc_data, ifelse(x126_4 == 0, "0", sprintf("%d(%.3f)", x125_3, x126_4)))
An option would be to loop over the sets of columns, use sprintf
to format the columns of interest and cbind
with the first column一个选项是循环列集,使用
sprintf
格式化感兴趣的列并使用第一列进行cbind
out <- cbind(conc_data[1], sapply(list(2:3, 4:5),
function(i) sprintf("%d(%f)",
round(conc_data[,i[1]], 2), conc_data[,i[2]])))
If the value for '0' needs to be zero如果“0”的值需要为零
out <- cbind(conc_data[1], sapply(list(2:3, 4:5), function(i) {
dat <- conc_data[i]
i1 <- !rowSums(dat != 0)
v1 <- do.call(sprintf, c(fmt = "%d(%.3f)", dat))
v1[i1] <- 0
v1
}))
names(out)[-1] <- names(conc_data)[c(2, 4)]
out
# kod_nar.id x123_1 x125_3
#1 1 0 0
#2 3 0(0.123) 1(0.234)
#3 2 0(0.122) 2(0.990)
Or more compactly或更紧凑
data.frame(c(conc_data[1], Map(sprintf, conc_data[c(2, 4)],
conc_data[c(3, 5)], MoreArgs = list(fmt = "%d(%.3f)"))))
We can split every two columns using split.default
and use sapply
to paste
the two columns together in the format required.我们可以使用
split.default
拆分每两列,并使用sapply
以所需的格式将两列paste
在一起。 We add names to the output by selecting althernating column name.我们通过选择交替列名称将名称添加到输出中。
output <- cbind(conc_data[1], sapply(split.default(conc_data[-1],
rep(seq_along(conc_data), each = 2)[1:(ncol(conc_data) - 1)]),
function(x) paste0(x[[1]], "(", x[[2]], ")")))
names(output)[-1] <- names(conc_data)[-1][c(TRUE, FALSE)]
output
# kod_nar.id x123_1 x125_3
#1 1 0(0) 0(0)
#2 3 0(0.123) 1(0.234)
#3 2 0(0.122) 2(0.99)
Or maybe a bit simpler to split using gl
或者使用
gl
拆分可能更简单一些
output <- cbind(conc_data[1], sapply(split.default(conc_data[-1],
gl((ncol(conc_data) - 1)/2, 2)),
function(x) paste0(x[[1]], "(", x[[2]], ")")))
If you melt to long format you can do this with data.table group operations and then dcast back to wide如果你融化成长格式,你可以用 data.table 组操作来做到这一点,然后 dcast 回宽
df_long <-
melt(conc_data, 1)[
, .(variable = variable[1],
value = sprintf('%.0f(%.3f)', value[1], value[2]))
, by = .(kod_nar.id, id = (rowid(kod_nar.id) - 1) %/% 2)]
out <- dcast(df_long, kod_nar.id ~ variable)
out
# kod_nar.id x123_1 x125_3
# 1: 1 0(0.000) 0(0.000)
# 2: 2 0(0.122) 2(0.990)
# 3: 3 0(0.123) 1(0.234)
If it's important to have just '0'
on those first rows you could add this additional step如果在第一行中只有
'0'
很重要,您可以添加此附加步骤
out <- out[, lapply(.SD, function(x) ifelse(grepl('[1-9]', x), x, '0'))]
out
# kod_nar.id x123_1 x125_3
# 1: 1 0 0
# 2: 2 0(0.122) 2(0.990)
# 3: 3 0(0.123) 1(0.234)
Here's a tidyverse
solution:这是一个
tidyverse
解决方案:
library(tidyverse)
conc_data %>%
mutate(x123_1 = ifelse(x123_1 == x124_2,
x123_1,
paste0(x123_1, "(", x124_2, ")")
),
x125_3 = ifelse(x125_3 == x126_4,
x125_3,
paste0(x125_3, "(", x126_4, ")")
)) %>%
select(x123_1, x125_3)
x123_1 x125_3
1 0 0
2 0(0.123) 1(0.234)
3 0(0.122) 2(0.99)
You can do this eg by using sapply
and paste
.例如,您可以通过使用
sapply
和paste
来做到这一点。 I'm assuming to print only one number if the numbers are equal in both columns:如果两列中的数字相等,我假设只打印一个数字:
tt <- seq(2,ncol(conc_data),2)
res <- cbind(conc_data[1], sapply(tt, function(i) {
ifelse(conc_data[,i] != conc_data[,i+1], paste0(conc_data[,i], "(", conc_data[,i+1],")") ,paste0(conc_data[,i]))
}
))
names(res)[-1] <- names(conc_data)[s]
res
# kod_nar.id x123_1 x125_3
#1 1 0 0
#2 3 0(0.123) 1(0.234)
#3 2 0(0.122) 2(0.99)
Or by using the column name direct in sapply
:或者直接在
sapply
使用列名:
tt <- seq(2,ncol(conc_data),2)
cbind(conc_data[1], sapply(names(conc_data)[tt], function(i) {
i2 <- which(names(conc_data) == i)+1
ifelse(conc_data[,i] != conc_data[,i2], paste0(conc_data[,i], "(", conc_data[,i2],")") ,paste0(conc_data[,i]))
}
))
# kod_nar.id x123_1 x125_3
#1 1 0 0
#2 3 0(0.123) 1(0.234)
#3 2 0(0.122) 2(0.99)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.