简体   繁体   English

从data.frame / matrix中提取1列的一部分

[英]Extract a portion of 1 column from data.frame/matrix

I get flummoxed by some of the simplest of things. 我被一些最简单的东西弄得晕头转向。 In the following code I wanted to extract just a portion of one column in a data.frame called 'a'. 在下面的代码中,我想只提取一个名为“a”的data.frame中的一列的一部分。 I get the right values, but the final entity is padded with NAs which I don't want. 我得到了正确的值,但最后的实体用NA填充,我不想要。 'b' is the extracted column, 'c' is the correct portion of data but has extra NA padding at the end. 'b'是提取的列,'c'是数据的正确部分,但在末尾有额外的NA填充。

How do I best do this where 'c' is ends up naturally only 9 elements long? 如果'c'最终自然只有9个元素长,我怎么做才最好? (ie - the 15 original minus the 6 I skipped) (即 - 原来的15减去我跳过的6)

NumBars = 6
a = as.data.frame(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
a[,2] = c(11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)
names(a)[1] = "Data1"
names(a)[2] = "Data2"

{Use 1st column of data only}

b = as.matrix(a[,1])
c = as.matrix(b[NumBars+1:length(b)])

The immediate reason why you're getting NA's is that the sequence operator : takes precedence over the addition operator + , as is detailed in the R Language Definition . 你获得NA的直接原因是序列运算符:优先于加法运算符+ ,如R语言定义中所述 Therefore NumBars+1:length(b) is not the same as (NumBars+1):length(b) . 因此, NumBars+1:length(b)(NumBars+1):length(b) The first adds NumBars to the vector 1:length(b) , while the second adds first and then takes the sequence. 第一个将NumBars添加到向量1:length(b) ,而第二个添加第一个然后获取序列。

ind.1 <- 1+1:3   # == 2:4
ind.2 <- (1+1):3 # == 2:3 

When you index with this longer vector, you get all the elements you want, and you also are asking for entries like b[length(b)+1] , which the R Language Definition tells us returns NA . 使用这个较长的向量进行索引时,可以获得所需的所有元素,并且还要求输入b[length(b)+1]等条目, R语言定义告诉我们返回NA That's why you have trailing NA 's. 这就是你跟随NA的原因。

If i is positive and exceeds length(x) then the corresponding selection is NA . 如果i为正且超过length(x)则相应的选择为NA A negative out of bounds value for i causes an error. i负超出范围值会导致错误。

b <- c(1,2,3)
b[ind.1] 
#[1] 2 3 NA
b[ind.2] 
#[1] 2 3

From a design perspective, the other solutions listed here are good choices to help avoid this mistake. 从设计角度来看,此处列出的其他解决方案是帮助避免此错误的不错选择。

It is often easier to think of what you want to remove from your vector / matrix. 通常更容易想到要从矢量/矩阵中删除的内容。 Use negative subscripts to remove items. 使用否定下标删除项目。

c = as.matrix(b[-1:-NumBars])
c
##      [,1]
## [1,]    7
## [2,]    8
## [3,]    9
## [4,]   10
## [5,]   11
## [6,]   12
## [7,]   13
## [8,]   14
## [9,]   15

If your goal is to remove NA s from a column, you can also do something like 如果您的目标是从列中删除NA ,您也可以执行类似的操作

c <- na.omit(a[,1])

Eg 例如

> x
[1]  1  2  3 NA NA
> na.omit(x)
[1] 1 2 3
attr(,"na.action")
[1] 4 5
attr(,"class")
[1] "omit"

You can ignore the attributes - they are there to let you know what elements were removed. 您可以忽略这些属性 - 它们可以让您知道删除了哪些元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM