删除重复的列dplyr

Question

This morning while doing some analysis with a data frame I got an error due to the presence of duplicated column names. 今天早上在使用数据框进行一些分析时，由于存在重复的列名，我收到了错误。 I tried to find a solution using exclusively dplyr but I could not find anything that works. 我试图找到一个专门使用dplyr的解决方案，但我找不到任何有效的方法。 Here is an example to illustrate the problem. 这是一个说明问题的例子。 A dataframe with a duplicated column name. 具有重复列名称的数据框。

x <- data.frame(matrix(c(1, 2, 3),
                c(2,2,1),nrow=2,ncol=3))
colnames(x) <- c("a", "a", "b")

When I try to drop the first column using the select command I get an error 当我尝试使用select命令删除第一列时，我收到一个错误

x %>%
  select(-1)%>%filter(b>1)

Error: found duplicated column name: a

I can get rid of the column easily using traditional indexing and the using dplyr to filter by value 我可以使用传统的索引和使用dplyr按值过滤来轻松删除列

x<-x[,-1]%>%filter(b>1)

Which produces the desired output 这产生了所需的输出

Any ideas on how to perform this using only dplyr grammar? 关于如何仅使用dplyr语法执行此操作的任何想法？

Answer 1

This could work, taking advantage of make.names behaviour. 这可以工作，利用make.names行为。 Don't know if I've cheated here, but it seems mostly to take advantage of dplyr functions. 不知道我是否在这里作弊，但似乎主要是利用dplyr功能。

x %>% 
    setNames(make.names(names(.), unique = TRUE)) %>% 
    select(-matches("*\\.[1-9]+$"))

Answer 2

If you wanted to get rid of the first column completely I would just do 如果你想完全摆脱第一列，我会这样做

x <- x[, c(2:3)]

Or alternatively you could rename it 或者您可以重命名它

colnames(x)[1] <- "a.1"

删除重复的列dplyr

问题描述

2 个解决方案

解决方案1
2 2016-09-20 21:05:45

解决方案2
0 2017-04-29 02:55:38

删除重复的列dplyr

问题描述

2 个解决方案

解决方案1 2 2016-09-20 21:05:45

解决方案2 0 2017-04-29 02:55:38

解决方案1
2 2016-09-20 21:05:45

解决方案2
0 2017-04-29 02:55:38