简体   繁体   English

在R中的相同数据框中绑定具有相似列名的列

[英]Binding columns with similar column names in the same dataframe in R

I have a data frame that looks somewhat like this: 我有一个看起来像这样的数据框:

df <- data.frame(0:2, 1:3, 2:4, 5:7, 6:8, 2:4, 0:2, 1:3, 2:4)
colnames(df) <- rep(c('a', 'b', 'c'), 3)
> df
  a b c a b c a b c
1 0 1 2 5 6 2 0 1 2
2 1 2 3 6 7 3 1 2 3
3 2 3 4 7 8 4 2 3 4

There are multiple columns that have the same name. 有多个列具有相同的名称。 I would like to rearrange the data frame so that the columns with the same names combine into their own supercolumn, so that there are only unique column names left, for example: 我想重新排列数据框,以便具有相同名称的列组合成它们自己的超级列,这样只剩下唯一的列名,例如:

> df
  a b c
1 0 1 2
2 1 2 3
3 2 3 4
4 5 6 2
5 6 7 3
6 7 8 4
7 0 1 2
8 1 2 3
9 2 3 4

Any thoughts on how to do this? 有关如何做到这一点的任何想法? Thanks in advance! 提前致谢!

This will do the trick, I suppose. 我猜想,这将成功。

Explanation 说明

df[,names(df) == 'a'] will select all columns with name a df[,names(df) == 'a']将选择名称为a所有列

unlist will convert above columns into 1 single vector unlist将上面的列转换为1个单向量

unname will remove some stray rownames given to these vectors. unname将删除一些给这些向量的迷路unname

unique(names(df)) will give you unique column names in df unique(names(df))将在df为您提供唯一的列名

sapply will apply the inline function to all values of unique(names(df)) sapply会将内联函数应用于所有unique(names(df))unique(names(df))

> df
  a b c a b c a b c
1 0 1 2 5 6 2 0 1 2
2 1 2 3 6 7 3 1 2 3
3 2 3 4 7 8 4 2 3 4
> sapply(unique(names(df)), function(x) unname(unlist(df[,names(df)==x])))
      a b c
 [1,] 0 1 2
 [2,] 1 2 3
 [3,] 2 3 4
 [4,] 5 6 2
 [5,] 6 7 3
 [6,] 7 8 4
 [7,] 0 1 2
 [8,] 1 2 3
 [9,] 2 3 4

My version: 我的版本:

library(reshape)
as.data.frame(with(melt(df), split(value, variable)))
  a b c
1 0 1 2
2 1 2 3
3 2 3 4
4 0 1 2
5 1 2 3
6 2 3 4
7 0 1 2
8 1 2 3
9 2 3 4

In the step using melt I transform the dataset: 在使用melt的步骤中,我转换数据集:

> melt(df)
Using  as id variables
   variable value
1         a     0
2         a     1
3         a     2
4         b     1
5         b     2
6         b     3
7         c     2
8         c     3
9         c     4
10        a     0
11        a     1
12        a     2
13        b     1
14        b     2
15        b     3
16        c     2
17        c     3
18        c     4
19        a     0
20        a     1
21        a     2
22        b     1
23        b     2
24        b     3
25        c     2
26        c     3
27        c     4

Then I split up the value column for each unique level of variable using split : 然后我使用split每个唯一级别variablevalue列:

$a
[1] 0 1 2 0 1 2 0 1 2

$b
[1] 1 2 3 1 2 3 1 2 3

$c
[1] 2 3 4 2 3 4 2 3 4

then this only needs an as.data.frame to become the data structure you need. 那么这只需要一个as.data.frame可以成为你需要的数据结构。

Use %in% and some unlisting 使用%in%和一些unlisting

zz <- lapply(unique(names(df)), function(x,y) as.vector(unlist(df[which(y %in% x)])),y=names(df))
names(zz) <- unique(names(df))
as.data.frame(zz)
  a b c
1 0 1 2
2 1 2 3
3 2 3 4
4 5 6 2
5 6 7 3
6 7 8 4
7 0 1 2
8 1 2 3
9 2 3 4

I would sort the data.frame by column name, unlist, and use as.data.frame on a matrix : 我将按列名称对data.frame进行排序,取消列表,并在matrix上使用as.data.frame

A <- unique(names(df))[order(unique(names(df)))]
B <- matrix(unlist(df[, order(names(df))], use.names=FALSE), ncol = length(A))
B <- setNames(as.data.frame(B), A)
B
#   a b c
# 1 0 1 2
# 2 1 2 3
# 3 2 3 4
# 4 5 6 2
# 5 6 7 3
# 6 7 8 4
# 7 0 1 2
# 8 1 2 3
# 9 2 3 4

I'm not at the computer now, so can't test this, but.. . 我现在不在电脑前,所以不能测试这个,但是...... this might work: 这可能有效:

do.call(cbind, 
     lapply(names(df) function(x) do.call(rbind, df[, names(df) == x])) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM