简体   繁体   中英

Check how a file is delimited before reading into R

I have a list of files (not made by me) that for some reason are all labelled (".csv") even though some are csv and some are tab delimited. So when I try read them in R, I would have to specify separator manually. Unless, someone knows if there is a way to check this before I get a mangled file read in.

Should have put this in an answer...

The fread function in the package data.table makes an attempt to guess the correct separator. It's probably not perfect, but it will likely handle most simple cases.

Since tabs are not particularly likely to be in data unless there as a separator could do this as a usually correct test:

dat <- if( "\t" %in% strsplit(readLines("path/fil.csv", n=1)[1], split="")[[1]] ) { 
     read.table("path/fil.csv", sep="\t") }else{
     read.table("path/fil.csv", sep=",") }

(Handles only the case of either "\\t" or "," as described.) Testing:

> dat <- if( "\t" %in% strsplit(readLines(textConnection("a\tb\tc\nd\te\tf"), n=1)[1], split="")[[1]] ) { 
+      read.table(textConnection("a\tb\tc\nd\te\tf"), sep="\t") }else{
+      read.table(textConnection("a,b,c\nd,e,f"), sep=",") }
> dat
  V1 V2 V3
1  a  b  c
2  d  e  f

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM