简体繁体 English

read.csv与read.table-难以比较结果

[英]read.csv vs read.table - difficulty in comparing results

原文 2015-06-30 15:56:28 5 1 r/ csv

I have a tab separated data with a column containing addresses including commas in the addresses. 我有一个制表符，分隔数据，其中一列包含地址，其中包括地址中的逗号。

I am using read.table to import a data into R, however my colleague used read.csv with sep="\\t" to do the same and we both end up with different number of rows in the imported data frame. 我正在使用read.table将数据导入到R中，但是我的同事使用带有sep =“ \\ t”的read.csv来完成相同的工作，并且在导入的数据帧中，我们最终都拥有不同数量的行。

Also, when I import the data in Excel, I get the same number of records as read.csv with sep="\\t". 另外，当我在Excel中导入数据时，我得到的记录数与带有sep =“ \\ t”的read.csv相同。

What is the most concrete way i can verify which import and number of records is the correct one? 我可以验证最正确的导入方式和记录数量的最具体方法是什么？

Please let me know what details I can add here to help answer the question. 请让我知道我可以在此处添加哪些详细信息以帮助回答问题。

1 个解决方案

Read the help files for the two functions via ?read.table (that'll show both). 通过?read.table阅读这两个功能的帮助文件（将同时显示两者）。 You'll see that read.csv is just read.table with some of the arguments set to different defaults. 您会看到read.csv只是read.table ，其中一些参数设置为不同的默认值。

One of those arguments is header . 这些参数之一是header 。 In read.table with sep="\\t" , try also using header=TRUE . 在带有sep="\\t" read.table ，还尝试使用header=TRUE 。

If that doesn't work, do the following: read.table('file.txt', header=TRUE, sep="\\t", quote="\\"", dec=".", fill=TRUE, comment.char="" . That code should give the exact same result as read.csv , because I just set all the arguments to those used by read.csv . You can then begin by changing some of those arguments back to the read.table default (by not specifying them) to figure out which argument is causing the difference between read.csv and read.table for your data.frame (remember, more than one argument could be causing the difference). I can easily see ways that the header , sep , quote , comment.char , and fill arguments could affect the number of rows in the output. I can't imagine how dec would have this effect, but I wouldn't be surprised if it matters. 如果那不起作用，请执行以下操作： read.table('file.txt', header=TRUE, sep="\\t", quote="\\"", dec=".", fill=TRUE, comment.char="" ，这代码应该给出确切的结果相同read.csv ，因为我只是将所有的参数那些使用read.csv 。然后，您可以通过改变一些这些参数回到开始read.table默认值（不指定它们）以找出哪个参数导致了read.csv和read.table之间的差异（请记住，多个参数可能会导致差异）。我可以很容易地看到header ， sep ， quote ， comment.char和fill参数可能会影响输出中的行数，我无法想象dec将如何产生这种效果，但是如果有问题我也不会感到惊讶。

As a rule, I tend to expect that different input = different output, and when different input = same output, I consider that to be exceptional. 通常，我倾向于期望不同的输入=不同的输出，而当不同的输入=同一输出时，我认为这是例外。 The functions you're using are similar, but they're differences are different ways of interpreting the text file, so I would expect them to yield different results. 您使用的功能是相似的，但区别在于它们是解释文本文件的不同方法，因此，我希望它们产生不同的结果。 Which is "right" is not a matter of one of the functions preforming correctly and the other incorrectly, it's a matter of the user understanding what they are doing in relation to the input. 哪个“正确”与正确执行一个功能以及另一个不正确执行功能有关，而是用户了解他们相对于输入所做的事情。