简体   繁体   English

R中的read.table和注释

[英]read.table and comments in R

I'd like to add metadata to my spreadsheet as comments, and have R ignore these afterwards. 我想将元数据添加到我的电子表格作为评论,然后R忽略这些。

My data are of the form 我的数据是形式的

v1,v2,v3,
1,5,7,
4,2,1,#possible error,

(which the exception that it is much longer. the first comment actually appears well outside of the top 5 rows, used by scan to determine the number of columns) (它的例外情况要长得多。第一条评论实际上显示在前5行之外, scan用来确定列数)

I've been trying: 我一直在努力:

read.table("data.name",header=TRUE,sep=",",stringsAsFactors=FALSE,comment.char="#")

But read.table (and, for that matter, count.fields ) thinks that I have one more field than I actually do. read.table (以及,就此而言, count.fields )认为我还有一个比我实际做的更多的领域。 My data frame ends up with a blank column called 'X'. 我的数据框最后是一个名为“X”的空白列。 I think this is because my spreadsheet program adds commas to the end of every line (as in the above example). 我认为这是因为我的电子表格程序在每行的末尾都添加了逗号(如上例所示)。

Using flush=TRUE has no effect, even though (according to the help file) it " [...] allows putting comments after the last field [...]" 使用flush=TRUE没有效果,即使(根据帮助文件)它“[...]允许在最后一个字段[...]之后添加注释”

Using colClasses=c(rep(NA,3),NULL) has no effect either. 使用colClasses=c(rep(NA,3),NULL)也没有效果。

I could just delete the column afterwards, but since it seems that this is a common practice I'd like to learn how to do it properly. 我之后可以删除该列,但由于这似乎是一种常见的做法,我想学习如何正确地做到这一点。

Thanks, 谢谢,

Andrew 安德鲁

From the doc ( ?read.table ): 从doc( ?read.table ):

colClasses character. colClasses字符。 A vector of classes to be assumed for the columns. 要为列假定的类向量。 Recycled as necessary, or if the character vector is named, unspecified values are taken to be NA. 根据需要进行回收,或者如果命名了字符向量,则未指定的值将被视为NA。

Possible values are NA (the default, when type.convert is used), "NULL" (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or "factor", "Date" or "POSIXct". 可能的值是NA(默认情况下,使用type.convert时),“NULL”(跳过列时),原子矢量类之一(逻辑,整数,数字,复数,字符,原始)或“因子” “,”日期“或”POSIXct“。 Otherwise there needs to be an as method (from package methods) for conversion from "character" to the specified formal class. 否则,需要有一个as方法(来自包方法),用于从“character”转换为指定的正式类。

Note that it says to use "NULL" , not NULL . 请注意,它表示使用"NULL" ,而不是NULL Indeed, this works as expected: 实际上,这可以按预期工作:

con <- textConnection("
v1,v2,v3,
1,5,7,
4,2,1,#possible error,
")

read.table(con, header = TRUE, sep = ",",
           stringsAsFactors = FALSE, comment.char = "#",
           colClasses = c(rep(NA, 3), "NULL"))
#   v1 v2 v3
# 1  1  5  7
# 2  4  2  1

Your issue regarding the comment character and the number of data columns are unrelated to read.table() but not to your spreadsheet (I'm using Excel). 您关于注释字符和数据列数的问题与read.table()无关,但与电子表格无关(我正在使用Excel)。 The default behavior for read.table is to treat # as the beginning of a comment and ignore what follows. read.table的默认行为是将#作为注释的开头处理,并忽略后面的内容。 The reason you are getting an error is because there is a trailing comma at the end of your data lines. 您收到错误的原因是因为数据行末尾有一个逗号。 That tells read.table that more data should follow. 这告诉read.table应该遵循更多数据。 Reading your original example: 阅读原始示例:

> read.table(text="v1, v2, v3,
+  1,5,7,
+  4,2,1,#possible error,", sep=",", header=TRUE)
  v1 v2 v3  X
1  1  5  7 NA
2  4  2  1 NA

The comment is ignored by default and a fourth column is created and labeled X. You could easily delete this column after the fact or use the method that @flodel mentions or you can remove the trailing comma before reading the file into R. In Excel, the trailing comma is added when you save a file as csv (comma separated variables) because the comment appears in the fourth column and Excel doesn't recognize it as a comment. 默认情况下会忽略注释,并创建第四列并标记为X.您可以在事实之后轻松删除此列,或者使用@flodel提到的方法,或者在将文件读入R之前删除尾随逗号。在Excel中,将文件另存为csv(逗号分隔变量)时,会添加尾随逗号,因为注释显示在第四列中,Excel不会将其识别为注释。 If you save the file as space-separated, the problem goes away (remove the sep= argument since the space is the default separator): 如果将文件保存为以空格分隔,则问题消失(删除sep =参数,因为空格是默认分隔符):

> read.table(text="v1 v2 v3 
+    1 5 7 
+    4 2 1#possible error", header=TRUE)
  v1 v2 v3
1  1  5  7
2  4  2  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM