how to get point set (x,y) in a desired area in r

Question

The figure is the plot of x,y set in a excel file, total 8760 pair of x and y. I want to remove the noise data pair in red circle area and output a new excel file with remain data pair. How could I do it in R?

Answer 1

Both R and EXCEL read and write .csv files, so you can use those to transfer the data back and forth.

You do not provide any data so I made some junk data to produce a similar problem.

DATA

set.seed(2017)
x = runif(8760, 0,16)
y = c(abs(rnorm(8000, 0, 1)), runif(760,0,8)) 
XY = data.frame(x,y)

One way to identify noise points is by looking at the distance to the nearest neighbors. In dense areas, nearest neighbors will be closer. In non-dense areas, they will be further apart. The package dbscan provides a nice function to get the distance to the k nearest neighbors. For this problem, I used k=6, but you may need to tune for your data. Looking at the distribution of distances to the 6th nearest neighbor we see that most points have 6 neighbors within a distance of 0.2

XY6 = kNNdist(XY, 6)
plot(density(XY6[,6]))

So I will assume that point whose 6th nearest neighbor is further away are noise points. Just changing the color to see which points are affected, we get

TYPE = rep(1,8760)
TYPE[XY6[,6] > 0.2] = 2
plot(XY, col=TYPE)

Of course, if you wish to restrict to the non-noise points, you can use

NonNoise = XY[XY6[,6] > 0.2,]

Answer 2

Using @G5W's example:

Make up data:

set.seed(2017)
x = runif(8760, 0,16)
y = c(abs(rnorm(8000, 0, 1)), runif(760,0,8)) 
XY = data.frame(x,y)

Fit a quantile regression to the 90th percentile:

library(quantreg)
library(splines)
qq <- rq(y~ns(x,20),tau=0.9,data=XY)

Compute and draw the predicted curve:

xvec <- seq(0,16,length.out=101)
pp <- predict(qq,newdata=data.frame(x=xvec))
plot(y~x,data=XY)
lines(xvec,pp,col=2,lwd=2)

Keep only points below the predicted line:

XY2 <- subset(XY,y<predict(qq,newdata=data.frame(x)))

plot(y~x,data=XY2)
lines(xvec,pp,col=2,lwd=2)

You can make the line less wiggly by lowering the number of knots, eg y~ns(x,10)

how to get point set (x,y) in a desired area in r

Question

2 answers

solution1
3 2017-11-15 17:58:22

solution2
3 ACCPTED 2017-11-15 18:10:50

how to get point set (x,y) in a desired area in r

Question

2 answers

solution1 3 2017-11-15 17:58:22

solution2 3 ACCPTED 2017-11-15 18:10:50

solution1
3 2017-11-15 17:58:22

solution2
3 ACCPTED 2017-11-15 18:10:50