简体   繁体   English

read.csv与read.table

[英]read.csv vs. read.table

I have seen in several cases that while read.table() is not able to read a tab delimited file (for example the annotation table of a microarray) returning the following error: 我在几种情况下看到,虽然read.table()无法读取制表符分隔文件(例如微阵列的注释表),但返回以下错误:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line xxx did not have yyy elements

read.csv() works perfectly on the same file with no errors. read.csv()在同一个文件上完美运行,没有错误。 I think also the speed of read.csv() is also higher than read.table() . 我认为read.csv()的速度也高于read.table()

Even more: read.table() is doing very crazy reading a file of me. 更多: read.table()正在读取我的文件非常疯狂。 It makes this error while reading line 100, but when I copy and paste lines 90 to 110 just after the head of the same file, it still makes error of line 100+21 (new lines copied at the beginning). 它在读取第100行时会出现此错误,但是当我在相同文件的头部之后复制并粘贴第90行到第110行时,它仍然会出现第100 + 21行的错误(在开头复制新行)。 If there is any problem with that line, why doesn't it report that error while reading the pasted line at the beginning? 如果该行存在任何问题,为什么在开头读取粘贴的行时不报告该错误? I confirm that read.csv() reads the same file with no error. 我确认read.csv()读取相同的文件没有错误。

Do you have any idea of why read.table() is unable to read the same files that read.csv() works on it? 您是否知道为什么read.table()无法读取read.csv()起作用的相同文件? Also is there any reason to use read.table() in any cases? 在任何情况下也有任何理由使用read.table()吗?

read.csv is a fairly thin wrapper around read.table ; read.csv是一个相当薄的read.table包装器; I would be quite surprised if you couldn't exactly replicate the behaviour of read.csv by supplying the correct arguments to read.table . 如果你不能通过向read.table提供正确的参数来完全复制read.csv的行为,我会感到非常惊讶。 However, some of those arguments (such as the way that quotation marks or comment characters are handled) could well change the speed and behaviour of the function. 但是,其中一些参数(例如处理引号或注释字符的方式)可能会改变函数的速度和行为。

In particular, this is the full definition of read.csv : 特别是,这是read.csv完整定义:

function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", 
    fill = TRUE, comment.char = "", ...) {
     read.table(file = file, header = header, sep = sep, quote = quote, 
        dec = dec, fill = fill, comment.char = comment.char, ...)
}

so as stated it's just read.table with a particular set of options. read.table它只是read.table与一组特定的选项。

As @Chase states in the comments below, the help page for read.table() says just as much under Details : 正如@Chase在下面的评论中所述, read.table()的帮助页面在Details下也是如此:

read.csv and read.csv2 are identical to read.table except for the defaults. read.csv和read.csv2与read.table相同,但默认值除外。 They are intended for reading 'comma separated value' files ('.csv') or (read.csv2) the variant used in countries that use a comma as decimal point and a semicolon as field separator. 它们用于读取“逗号分隔值”文件('.csv')或(read.csv2)在使用逗号作为小数点和分号作为字段分隔符的国家/地区中使用的变体。

Don't use read.table to read tab-delimited files, use read.delim . 不要使用read.table来读取制表符分隔的文件,请使用read.delim (It is just a thin wrapper around read.table but it sets the options to appropriate values) (它只是read.table一个薄包装,但它将选项设置为适当的值)

read_table()确实有时会失败的标签sep “编辑文件和设置sep='\\s+'可以帮助承担项目在表中已经没有空间

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM