简体   繁体   English

如何删除R中data.tables中重复的(按名称)列?

[英]How to remove duplicated (by name) column in data.tables in R?

While reading a data set using fread , I've noticed that sometimes I'm getting duplicated column names, for example ( fread doesn't have check.names argument)在使用fread读取数据集时,我注意到有时我会得到重复的列名,例如( fread没有check.names参数)

> data.table( x = 1, x = 2)
   x x
1: 1 2

The question is: is there any way to remove 1 of 2 columns if they have the same name?问题是:如果它们具有相同的名称,有没有办法删除 2 列中的 1 列?

How about怎么样

dt[, .SD, .SDcols = unique(names(dt))]

This selects the first occurrence of each name (I'm not sure how you want to handle this).这将选择每个名称的第一次出现(我不确定您想如何处理)。

As @DavidArenburg suggests in comments above, you could use check.names=TRUE in data.table() or fread()正如@DavidArenburg 在上面的评论中所建议的那样,您可以在data.table()fread()使用check.names=TRUE

.SDcols approaches would return a copy of the columns you're selecting. .SDcols方法将返回您选择的列的副本 Instead just remove those duplicated columns using := , by reference.相反,只需使用:=通过引用删除那些重复的列。

dt[, which(duplicated(names(dt))) := NULL]
#    x
# 1: 1

Different approaches:不同的方法:

  1. Indexing索引

    my.data.table <- my.data.table[ ,-2]

  2. Subsetting子集

    my.data.table <- subset(my.data.table, select = -2)

  3. Making unique names if 1. and 2. are not ideal (when having hundreds of columns, for instance)如果 1. 和 2. 不理想,则使用唯一名称(例如,当有数百列时)

    setnames(my.data.table, make.names(names = names(my.data.table), unique=TRUE))

  4. Optionnaly systematize deletion of variables which names meet some criterion (here, we'll get rid of all variables having a name ending with ".X" (X being a number, starting at 2 when using make.names )可选地系统化删除名称符合某些标准的变量(在这里,我们将删除名称以“.X”结尾的所有变量(X 是一个数字,使用make.names时从 2 开始)

    my.data.table <- subset(my.data.table, select = !grepl(pattern = "\\\\.\\\\d$", x = names(my.data.table)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM