简体   繁体   English

R:read.table在调用列时检测制表符分隔的文件中缺少值的问题

[英]R: Issue with read.table detecting missing values in a tab-delimited file when calling columns

I am trying to do something very simple and am having a heck of a time with it. 我正在尝试做一些非常简单的事情,并且花了一点时间。 I have a set of data that is stored in a tab-delimited file. 我有一组存储在制表符分隔文件中的数据。 In this tab-delimited file, there are missing values. 在此制表符分隔的文件中,缺少值。 When I try to call the column that has the tab-separated values, the wrong column is called. 当我尝试调用具有制表符分隔值的列时,将调用错误的列。 I believe this is because the first value after the header line in the third column (the one I am trying to extract) is a column with a missing vlaue. 我认为这是因为第三列(我要提取的那个)中的标题行之后的第一个值是缺少vlaue的列。 Here is an example of my data (my actual data is 36000 lines, but all formatted the same): 这是我的数据的示例(我的实际数据是36000行,但所有格式都相同):

x   y   z   
230.125 49.875  0
230.375 49.875  0
230.625 49.875  0
261.125 49.875  0
261.375 49.875  0
251.625 48.875  4.38619211912155
251.875 48.875  3.70883572995663
252.125 48.875  3.2566264629364
252.375 48.875  3.00820730924606
254.125 48.875  7.88962166309357
254.375 48.875  8.50787222385406
254.625 48.875  8.95758244991303
254.875 48.875  9.47213044166565
255.125 48.875  9.96883320808411
255.375 48.875  10.4400730609894
255.625 48.875  10.6357674837112
255.875 48.875  9.81607600450516
274.125 48.875  0
274.375 48.875  0
274.625 48.875  0
274.875 48.875  0
275.125 48.875  0
275.375 48.875  0
275.625 48.875  0
275.875 48.875  0
276.125 48.875  0

I am trying to extract the third column and append it to another matrix to do calculations with later (this will be done for many files of the same type). 我正在尝试提取第三列,并将其附加到另一个矩阵中,以便稍后进行计算(这将对许多相同类型的文件完成)。 That's why I have a second matrix initialized here. 这就是为什么我在这里初始化第二个矩阵的原因。

Here is my code: 这是我的代码:

library(data.table)
temp <- c()
matrix_prelim <- matrix(nrow = 36000)
temp <- as.matrix(read.table("/myfilepath/example.txt", sep="\t", fill = TRUE. na.strings = "", header=TRUE)
matrix_prelim <- cbind(matrix_prelim, temp[[3]])

Then printing: 然后打印:

head(matrix_prelim)

yields: 产量:

      [,1]    [,2]
[1,]    NA 230.625
[2,]    NA 230.625
[3,]    NA 230.625
[4,]    NA 230.625
[5,]    NA 230.625
[6,]    NA 230.625

when what I would like is (knowing that initializing the matrix with no contents will give me a column of NAs, which is no problem): 当我想要的时候(知道初始化没有内容的矩阵会给我一列NA,这没问题):

      [,1]             [,2]
[1,]    NA                0
[2,]    NA                0
[3,]    NA                0
[4,]    NA                0
[5,]    NA                0
[6,]    NA 4.38619211912155

I have absolutely no idea what I am doing wrong. 我完全不知道我在做什么错。 Any help would be much appreciated. 任何帮助将非常感激。

Thank you! 谢谢!

EDIT: I should note that I have tried changing the na.strings argument to " ", taking the na.strings argument out completely, tried using fread and grabbing the third column (that just simply didn't work at all), and tried setting headers = FALSE. 编辑:我应该注意,我已经尝试将na.strings参数更改为“”,将na.strings参数完全取出,尝试使用fread并抓住第三列(这根本根本不起作用),并尝试了设置标题= FALSE。

Though you have called library(data.table) , you're not actually converting your data to data.table format. 尽管您已经调用了library(data.table) ,但实际上并没有将数据转换为data.table格式。 Instead you're data is likely read in as a data.frame , which is fine. 相反,您很可能以data.frame读取数据,这很好。

There is no need to initialize a matrix to store your 3rd column as a separate vector. 无需初始化矩阵即可将您的第3列存储为单独的向量。 Try something like this: 尝试这样的事情:

temp <- as.matrix(read.table("/myfilepath/example.txt", sep="\t", fill = TRUE. na.strings = "", header=TRUE)
matrix_prelim <- temp[3]

Let me know if this works. 让我知道这个是否奏效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM