简体   繁体   English

删除重复的列dplyr

[英]delete duplicated column dplyr

This morning while doing some analysis with a data frame I got an error due to the presence of duplicated column names. 今天早上在使用数据框进行一些分析时,由于存在重复的列名,我收到了错误。 I tried to find a solution using exclusively dplyr but I could not find anything that works. 我试图找到一个专门使用dplyr的解决方案,但我找不到任何有效的方法。 Here is an example to illustrate the problem. 这是一个说明问题的例子。 A dataframe with a duplicated column name. 具有重复列名称的数据框。

x <- data.frame(matrix(c(1, 2, 3),
                c(2,2,1),nrow=2,ncol=3))
colnames(x) <- c("a", "a", "b")

When I try to drop the first column using the select command I get an error 当我尝试使用select命令删除第一列时,我收到一个错误

x %>%
  select(-1)%>%filter(b>1)

Error: found duplicated column name: a

I can get rid of the column easily using traditional indexing and the using dplyr to filter by value 我可以使用传统的索引和使用dplyr按值过滤来轻松删除列

x<-x[,-1]%>%filter(b>1)

Which produces the desired output 这产生了所需的输出

> x
  a b
1 2 3
2 2 3

Any ideas on how to perform this using only dplyr grammar? 关于如何仅使用dplyr语法执行此操作的任何想法?

This could work, taking advantage of make.names behaviour. 这可以工作,利用make.names行为。 Don't know if I've cheated here, but it seems mostly to take advantage of dplyr functions. 不知道我是否在这里作弊,但似乎主要是利用dplyr功能。

x %>% 
    setNames(make.names(names(.), unique = TRUE)) %>% 
    select(-matches("*\\.[1-9]+$"))

If you wanted to get rid of the first column completely I would just do 如果你想完全摆脱第一列,我会这样做

x <- x[, c(2:3)]

Or alternatively you could rename it 或者您可以重命名它

colnames(x)[1] <- "a.1"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM