[英]Concatanate two columns in a data.frame/file with 1000 columns to one column in a new data.frame/file
[英]Match two data.frame (one per column name) and create a new data.frame
我有以下df1
:
structure(list(rchX = c(0.562189054726368, 0.552238805970149,
0.552238805970149, 0.54726368159204, 0.54726368159204, 0.54726368159204,
0.54228855721393, 0.54228855721393, 0.537313432835821, 0.537313432835821
), frqX = c(0.925373134328358, 0.925373134328358, 0.915422885572139,
0.965174129353234, 0.955223880597015, 0.875621890547264, 0.955223880597015,
0.890547263681592, 0.900497512437811, 0.850746268656716), `1` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), `2` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0), `3` = c(0, 1, 0, 1, 0, 0, 1, 0, 0, 0), `4` = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0), `5` = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0), `6` = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1), `7` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0), `8` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1), `9` = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0), `10` = c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0), `11` = c(1,
1, 1, 0, 0, 0, 0, 0, 0, 0), `12` = c(1, 0, 0, 1, 1, 1, 0, 1,
1, 1), `13` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `14` = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0), `15` = c(0, 0, 0, 0, 0, 1, 0, 0, 0,
0), `16` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `17` = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `18` = c(0, 0, 1, 0, 1, 0, 1, 0, 0, 0),
`19` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `20` = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA,
10L))
看起來像這樣:
rchX frqX 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 0.5621891 0.9253731 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0
2 0.5522388 0.9253731 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
3 0.5522388 0.9154229 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0
4 0.5472637 0.9651741 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
5 0.5472637 0.9552239 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0
6 0.5472637 0.8756219 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0
7 0.5422886 0.9552239 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0
8 0.5422886 0.8905473 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0
9 0.5373134 0.9004975 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
10 0.5373134 0.8507463 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0
還有第二個具有相應名稱的data.frame:
df <- data.frame(
a = seq(1:20),
b = LETTERS[1:20]
)
a b
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
6 6 F
7 7 G
8 8 H
9 9 I
10 10 J
11 11 K
12 12 L
13 13 M
14 14 N
15 15 O
16 16 P
17 17 Q
18 18 R
19 19 S
20 20 T
我想做的是檢查哪些列為1並與df
的相應字母匹配。 第6列中的1
表示“ F”,第11列中的1
表示“ K”。 總有3個匹配項,因此新data.frame的前兩行將如下所示:
rchX frqX varA varB varC
1 0.5621891 0.9253731 F K L
2 0.5522388 0.9253731 C F K
誰能幫我?
如果我們需要基於apply
的解決方案,我們可以做
cbind(df1[1:2], t(apply(df1[-(1:2)], 1, function(x)
setNames(as.character(df$b), df$a)[names(x)[which(as.logical(x))]])))
或者可以使用tidyverse
通過gather
荷蘭國際集團為“長”格式,做一個left_join
與鍵/ VAL數據集, summarise
通過與行號,RCHX,frqX分組的輸出,並且separate
成多列
library(tidyverse)
df1 %>%
mutate(rn = row_number()) %>%
gather(a, val, -rn, -rchX, -frqX) %>%
filter(val == 1) %>%
left_join(., df %>%
mutate(a = as.character(a))) %>%
select(-val) %>%
group_by(rn, rchX, frqX) %>%
summarise(b = toString(b)) %>%
separate(b, into = str_c("Var", LETTERS[1:3])) %>%
ungroup %>%
select(-rn)
# A tibble: 10 x 5
# rchX frqX VarA VarB VarC
# <dbl> <dbl> <chr> <chr> <chr>
# 1 0.562 0.925 F K L
# 2 0.552 0.925 C F K
# 3 0.552 0.915 F K R
# 4 0.547 0.965 C F L
# 5 0.547 0.955 F L R
# 6 0.547 0.876 F L O
# 7 0.542 0.955 C F R
# 8 0.542 0.891 F J L
# 9 0.537 0.900 E F L
#10 0.537 0.851 F H L
我們還可以更有效地使用base R
m1 <- `dim<-`(setNames(as.character(df$b),
df$a)[names(df1)[-(1:2)][col(df1[-(1:2)])]], dim(df1[-(1:2)]))
out <- read.table(text= trimws(do.call(paste,
as.data.frame(replace(m1, df1[-(1:2)] == 0, "")))), header = FALSE)
cbind(df1[1:2], out)
# rchX frqX V1 V2 V3
#1 0.5621891 0.9253731 F K L
#2 0.5522388 0.9253731 C F K
#3 0.5522388 0.9154229 F K R
#4 0.5472637 0.9651741 C F L
#5 0.5472637 0.9552239 F L R
#6 0.5472637 0.8756219 F L O
#7 0.5422886 0.9552239 C F R
#8 0.5422886 0.8905473 F J L
#9 0.5373134 0.9004975 E F L
#10 0.5373134 0.8507463 F H L
在基R,一個方法是使用apply
,丟棄哪些是0值時,它們的名稱與比較a
的柱df
,並得到相應的b
值。
cbind(df1[1:2], t(apply(df1[-c(1:2)], 1, function(x)
df$b[match(names(x[x!=0]), df$a)])))
# rchX frqX 1 2 3
#1 0.5621890547 0.9253731343 F K L
#2 0.5522388060 0.9253731343 C F K
#3 0.5522388060 0.9154228856 F K R
#4 0.5472636816 0.9651741294 C F L
#5 0.5472636816 0.9552238806 F L R
#6 0.5472636816 0.8756218905 F L O
#7 0.5422885572 0.9552238806 C F R
#8 0.5422885572 0.8905472637 F J L
#9 0.5373134328 0.9004975124 E F L
#10 0.5373134328 0.8507462687 F H L
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.