[英]How to remove duplicated (by name) column in data.tables in R?
While reading a data set using fread
, I've noticed that sometimes I'm getting duplicated column names, for example ( fread
doesn't have check.names
argument)在使用fread
读取数据集时,我注意到有时我会得到重复的列名,例如( fread
没有check.names
参数)
> data.table( x = 1, x = 2)
x x
1: 1 2
The question is: is there any way to remove 1 of 2 columns if they have the same name?问题是:如果它们具有相同的名称,有没有办法删除 2 列中的 1 列?
How about怎么样
dt[, .SD, .SDcols = unique(names(dt))]
This selects the first occurrence of each name (I'm not sure how you want to handle this).这将选择每个名称的第一次出现(我不确定您想如何处理)。
As @DavidArenburg suggests in comments above, you could use check.names=TRUE
in data.table()
or fread()
正如@DavidArenburg 在上面的评论中所建议的那样,您可以在data.table()
或fread()
使用check.names=TRUE
.SDcols
approaches would return a copy of the columns you're selecting. .SDcols
方法将返回您选择的列的副本。 Instead just remove those duplicated columns using :=
, by reference.相反,只需使用:=
通过引用删除那些重复的列。
dt[, which(duplicated(names(dt))) := NULL]
# x
# 1: 1
Different approaches:不同的方法:
Indexing索引
my.data.table <- my.data.table[ ,-2]
Subsetting子集
my.data.table <- subset(my.data.table, select = -2)
Making unique names if 1. and 2. are not ideal (when having hundreds of columns, for instance)如果 1. 和 2. 不理想,则使用唯一名称(例如,当有数百列时)
setnames(my.data.table, make.names(names = names(my.data.table), unique=TRUE))
Optionnaly systematize deletion of variables which names meet some criterion (here, we'll get rid of all variables having a name ending with ".X" (X being a number, starting at 2 when using make.names
)可选地系统化删除名称符合某些标准的变量(在这里,我们将删除名称以“.X”结尾的所有变量(X 是一个数字,使用make.names
时从 2 开始)
my.data.table <- subset(my.data.table, select = !grepl(pattern = "\\\\.\\\\d$", x = names(my.data.table)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.