简体   繁体   English

fread下降:错过col名称的重复(data.table R)

[英]drop in fread: misses repetitions of col name (data.table R)

I've got a file with a bunch of filler columns (named, of course, filler ) that I'm trying to read with fread . 我有一帮填充柱(命名,当然,一个文件filler ),我试图用读fread

I'm using the drop argument, but it only drops the first (presumably left-right, but this is irrelevant) instance it encounters; 我使用的是drop参数,但是它只会删除遇到的第一个(大概是左右,但这无关紧要)实例; I want it to get rid of all of these. 我希望它摆脱所有这些。

Quick example: 快速示例:

header of .csv : .csv标头:

id,first_name,last_name,filler,birth_year,filler,position,filler,wage

names(dt) from using drop in fread : 使用drop in fread names(dt)

id,first_name,last_name,birth_year,filler,position,filler,wage

Further, if I just try: 此外,如果我只是尝试:

DT <- fread("file.csv", drop = rep("filler", 5L))

I get an error: 我收到一个错误:

Error in fread(paste0(substr(tt, 3, 4), "staff.csv"), drop = rep("filler", : Duplicates detected in drop fread(paste0(substr(tt, 3, 4), "staff.csv"), drop = rep("filler",在drop中检测到重复项

Any pointers? 有指针吗?

You could read the first line of the file with scan() , and then use that data as the drop indices in fread() 您可以使用scan()读取文件的第一行,然后将该数据用作fread()drop索引

## example text for fread()
x <- "id,first_name,last_name,filler,birth_year,filler,position,filler,wage
1,2,3,4,5,6,7,8,9"
## read the first line and find the filler
f <- scan(text = x, what = "", sep = ",", nlines = 1) == "filler"
## pass to fread()
fread(x, drop = which(f))
#    id first_name last_name birth_year position wage
# 1:  1          2         3          5        7    9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM