[英]Concatenate Combination of All Columns in a dataframe (R)
我想串聯列而不重復列組合。 我在下面有一個例子來解釋我要做什么
讓我們假設我有一個包含3列的數據框,並且我想基於原始列將更多列(由2組成)組合為兩個列
df的示例
V1 <- as.character(c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"))
V2 <- as.character(c("No","Yes","Yes","No","No","No","Yes","Yes","Yes","No"))
V3 <- as.character(c('Alpha',"Yes",'NA','Beta','NA',"Yes",'NA',"Yes","Yes",
'Something','Else'))
df_sample <- as.data.frame(cbind(V1, V2, V3))
df_sample
現在,我想將以下內容作為新列的輸出(顯示前兩行的結果以及所需的列名)
V1_V2 V1_V3 V2_V3
A_NO A_Alpha No_Alpha
A_Yes A_Yes Yes_Yes
我嘗試使用以下函數創建循環,但我有5個新列而不是3個列,例如V1_V3與V3_V1重復。 我正在嘗試弄清楚如何解決此問題。 另外,如果有更好的解決方案
str_eval=function(x) {return(eval(parse(text=x)))}
cat_cols <- c('V1','V2','V3')
for (i in (1:length(cat_cols))){
for (j in (1:length(cat_cols))){
if (i != j){
col_name <- paste(colnames(df_sample)[i],"_",colnames(df_sample)[j],sep="")
assign(col_name,
paste(df_sample[,cat_cols[i]],'_',df_sample[,cat_cols[j]],sep=""))
df_sample <- cbind(df_sample, str_eval(col_name))
colnames(df_sample)[ncol(df_sample)] <- paste(col_name)
rm(col_name)
}
}
}
不需要循環。 這可以使用矢量sapply
和combn
與paste
。 根據基准測試,它也比使用循環快約20倍。
cols_to_paste <- 2 #number of columns you want to paste together.
sapply(1:ncol(combn(names(df_sample), cols_to_paste)), function(x){
do.call(paste, c(df_sample[, combn(names(df_sample), cols_to_paste)[,x]], sep="_"))} )
[,1] [,2] [,3]
[1,] "A_No" "A_Alpha" "No_Alpha"
[2,] "A_Yes" "A_Yes" "Yes_Yes"
[3,] "A_Yes" "A_NA" "Yes_NA"
[4,] "A_No" "A_Beta" "No_Beta"
[5,] "A_No" "A_NA" "No_NA"
[6,] "B_No" "B_Yes" "No_Yes"
[7,] "B_Yes" "B_NA" "Yes_NA"
[8,] "B_Yes" "B_Yes" "Yes_Yes"
[9,] "B_Yes" "B_Yes" "Yes_Yes"
[10,] "B_No" "B_Something" "No_Something"
[11,] "A_No" "A_Else" "No_Else"
修改您的Soln
V1 <- as.character(c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"))
V2 <- as.character(c("No","Yes","Yes","No","No","No","Yes","Yes","Yes","No"))
V3 <- as.character(c('Alpha',"Yes",'NA','Beta','NA',"Yes",'NA',"Yes","Yes",
'Something'))
V4 = 1:10
V5 = 10:1
df_sample <- as.data.frame(cbind(V1, V2, V3, V4, V5))
df_sample
str_eval=function(x) {return(eval(parse(text=x)))}
cat_cols <- c('V1','V2','V3','V4','V5')
for (i in (1:length(cat_cols))){
if(i < length(cat_cols)){
for (j in (i+1):length(cat_cols)){
col_name <- paste(colnames(df_sample)[i],"_",colnames(df_sample)[j],sep="")
assign(col_name,
paste(df_sample[,cat_cols[i]],'_',df_sample[,cat_cols[j]],sep=""))
df_sample <- cbind(df_sample, str_eval(col_name))
colnames(df_sample)[ncol(df_sample)] <- paste(col_name)
rm(col_name)
}
}
}
head(df_sample)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.