简体   繁体   English

如何避免由于 R 中的 matplot 中缺失值而导致的差距?

[英]How to avoid gaps due to missing values in matplot in R?

I have a function that uses matplot to plot some data.我有一个 function 使用matplot到 plot 一些数据。 Data structure is like this:数据结构是这样的:

test = data.frame(x = 1:10, a = 1:10, b = 11:20)
matplot(test[,-1])
matlines(test[,1], test[,-1])

So far so good.到目前为止,一切都很好。 However, if there are missing values in the data set, then there are gaps in the resulting plot, and I would like to avoid those by connecting the edges of the gaps.但是,如果数据集中存在缺失值,则结果 plot 中存在间隙,我想通过连接间隙的边缘来避免这些间隙。

test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1]) 

在此处输入图像描述

In the real situation this is inside a function, the dimension of the matrix is bigger and the number of rows, columns and the position of the non-overlapping missing values may change between different calls, so I'd like to find a solution that could handle this in a flexible way.在实际情况下,这是在 function 内,矩阵的维度更大,行数、列数和非重叠缺失值的 position 可能会在不同调用之间发生变化,所以我想找到一个解决方案可以灵活地处理这个问题。 I also need to use matlines我还需要使用matlines

I was thinking maybe filling in the gaps with intrapolated data, but maybe there is a better solution.我在想也许可以用内推数据填补空白,但也许有更好的解决方案。

I came across this exact situation today, but I didn't want to interpolate values - I just wanted the lines to "span the gaps", so to speak.我今天遇到了这种确切的情况,但我不想插入值 - 我只是想让线条“跨越间隙”,可以这么说。 I came up with a solution that, in my opinion, is more elegant than interpolating, so I thought I'd post it even though the question is rather old.我想出了一个解决方案,在我看来,它比插值更优雅,所以我想即使问题很老,我也会发布它。

The problem causing the gaps is that there are NA s between consecutive values.导致差距的问题是连续值之间存在NA So my solution is to 'shift' the column values so that there are no NA gaps.所以我的解决方案是“移动”列值,以便没有NA间隙。 For example, a column consisting of c(1,2,NA,NA,5) would become c(1,2,5,NA,NA) .例如,由c(1,2,NA,NA,5)组成的列将变为c(1,2,5,NA,NA) I do this with a function called shift_vec_na() in an apply() loop.我在apply()循环中使用一个名为shift_vec_na()的函数来做到这一点。 The x values also need to be adjusted, so we can make the x values into a matrix using the same principle, but using the columns of the y matrix to determine which values to shift. x 值也需要调整,因此我们可以使用相同的原理将 x 值组成一个矩阵,但使用 y 矩阵的列来确定要移动哪些值。

Here's the code for the functions:下面是函数的代码:

# x -> vector
# bool -> boolean vector; must be same length as x. The values of x where bool 
#   is TRUE will be 'shifted' to the front of the vector, and the back of the
#   vector will be all NA (i.e. the number of NAs in the resulting vector is
#   sum(!bool))
# returns the 'shifted' vector (will be the same length as x)
shift_vec_na <- function(x, bool){
  n <- sum(bool)
  x[1:n] <- x[bool]
  x[(n + 1):length(x)] <- NA
  return(x)
}

# x -> vector
# y -> matrix, where nrow(y) == length(x)
# returns a list of two elements ('x' and 'y') that contain the 'adjusted'
# values that can be used with 'matplot()'
adj_data_matplot <- function(x, y){
  y2 <- apply(y, 2, function(col_i){
    return(shift_vec_na(col_i, !is.na(col_i)))
  })
  
  x2 <- apply(y, 2, function(col_i){
    return(shift_vec_na(x, !is.na(col_i)))
  })
  return(list(x = x2, y = y2))
}

Then, using the sample data:然后,使用示例数据:

test <- data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] <- NA
test$b[7] <- NA
lst <- adj_data_matplot(test[,1], test[,-1])

matplot(lst$x, lst$y, type = "b")

阴谋

You could use the na.interpolation function from the imputeTS package:您可以使用imputeTS包中的na.interpolation函数:

test = data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])

library('imputeTS')

test <- na.interpolation(test, option = "linear")
matplot(test[,-1])
matlines(test[,1], test[,-1])

在此处输入图片说明

Had also the same issue today.今天也有同样的问题。 In my context I was not permitted to interpolate.在我的上下文中,我不允许进行插值。 I am providing here a minimal, but sufficiently general working example of what I did.我在这里提供了一个最小但足够通用的工作示例来说明我所做的事情。 I hope it helps someone:我希望它能帮助某人:

mymatplot <- function(data, main=NULL, xlab=NULL, ylab=NULL,...){
    #graphical set up of the window
    plot.new()
    plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
    mtext(text = xlab,side = 1, line = 3)
    mtext(text = ylab,side = 2, line = 3)
    mtext(text = main,side = 3, line = 0)
    axis(1L)
    axis(2L)
    #plot the data
    for(i in 1:nrow(data)){
        nin.na <- !is.na(data[i,])
        lines(x=which(nin.na), y=data[i,nin.na], col = i,...)
    }
}

The core 'trick' is in x=which(nin.na) .核心“技巧”在x=which(nin.na)中。 It aligns the data points of the line consistently with the indices of the x axis.它使线的数据点与 x 轴的索引一致。
The lines台词

plot.new()  
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))  
mtext(text = xlab,side = 1, line = 3)  
mtext(text = ylab,side = 2, line = 3)  
mtext(text = main,side = 3, line = 0)  
axis(1L)  
axis(2L)`

draw the graphical part of the window. range(data, na.rm=TRUE) adapts the plot to a proper size being able to include all data points.绘制 window 的图形部分。range range(data, na.rm=TRUE)将 plot 调整为能够包含所有data点的适当大小。 mtext(...) is used to label the axes and provides the main title. mtext(...)用于 label 轴并提供主标题。 The axes themselves are drawn by the axis(...) command.轴本身由axis(...)命令绘制。
The following for -loop plots the data.以下for循环绘制数据。
The function head of mymatplot provides the ... argument for an optional passage of typical plot parameters as lty , lwt , cex etc. via. mymatplot 的 function head 为典型的mymatplot参数的可选通道提供了...参数, cex ltyplotlwt等 via 。 Those will be passed on to the lines .这些将传递给lines
At last word on the choice of colors - they are up to your flavor.最后说一下 colors 的选择——它们完全符合您的口味。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM