简体   繁体   中英

create a new column conditional on distance traveled between points in R

I am trying to create a new column conditional on another column, a bit like a moving average or moving window but based on distance between points. Take for example row 2 with a CO2 of 399.935. I would like to have the mean of all the points within 100 m (traveled) of that point. In my example (looking at column CumDist), rows 1, 3, 4, 5 would be selected to calculate the mean. The column CumDist (*100,000 to have the units in meters) consists of cumulative distance traveled. I have 5000 points and obviously the width (or the number of rows) of the moving window will vary.

I tested over() from the sp package, but it's problematic if the same road is taken more than once. I looked on the web for other solutions and I did not find anything that could help me.

dput(DF)
structure(list(CO2 = c(399.9350305, 399.9350305, 399.9350305, 
400.0320031, 400.0320031, 400.0320031, 399.7718229, 399.7718229, 
399.7718229, 399.3855075, 399.3855075, 399.3855075, 399.4708139, 
399.4708139, 399.4708139, 400.0362474, 400.0362474, 400.0362474, 
399.7556753, 399.7556753), lon = c(-103.7093538, -103.709352, 
-103.7093492, -103.7093467, -103.7093455, -103.7093465, -103.7093482, 
-103.7093596, -103.7094074, -103.7094625, -103.7094966, -103.709593, 
-103.709649, -103.7096717, -103.7097349, -103.7097795, -103.709827, 
-103.7099007, -103.709924, -103.7099887), lat = c(49.46972027, 
49.46972153, 49.46971675, 49.46971533, 49.46971307, 49.4697124, 
49.46970636, 49.46968214, 49.46960921, 49.46955984, 49.46953621, 
49.46945809, 49.46938994, 49.46935281, 49.46924309, 49.46918635, 
49.46914762, 49.46912566, 49.46912407, 49.46913321),distDiff = c(0.000342016147509882, 
0.000191466419697602, 0.000569046320857002, 0.000240367540492089, 
0.000265977754839834, 0.000103953049523505, 0.000682968856240796, 
0.0028176007969857, 0.00882013898948418, 0.00678966015562509, 
0.00360774024245839, 0.011149423290729, 0.00859796340323456, 
0.00444526066124642, 0.0130344010874029, 0.00709037369666853, 
0.00551435348701512, 0.00587377717110946, 0.00169806309901329, 
0.00479849401022625), CumDist = c(0.000342016147509882, 0.000533482567207484, 
0.00110252888806449, 0.00134289642855657, 0.00160887418339641, 
0.00171282723291991, 0.00239579608916071, 0.00521339688614641, 
0.0140335358756306, 0.0208231960312557, 0.0244309362737141, 0.0355803595644431, 
0.0441783229676777, 0.0486235836289241, 0.0616579847163269, 0.0687483584129955, 
0.0742627119000106, 0.08013648907112, 0.0818345521701333, 0.0866330461803596
)), .Names = c("X12CO2_dry", "coords.x1", "coords.x2", "V1", 
"CumDist"), row.names = 2:21, class = "data.frame")

thanks, Martin

The window that belongs to the i-th row starts at n[i] and ends at m[i]-1 . Hence the sum of the CO2-values in the i-th window is CumCO2[m[i]]-CumCO2[n[i]] . (Notice that the indices in CumCO2 are shifted by 1, because of the leading 0.) Dividing this CO2-sum by the window size m[i]-n[i] gives the values meanCO2 for the new column:

n <- sapply( df$CumDist,
             function(x){
               which.max( df$CumDist >= x-0.001 )
             }
           )

m <- sapply( df$CumDist,
             function(x){
               which.max( c(df$CumDist,Inf) > x+0.001 )
             }
           )

CumCO2 <- c( 0, cumsum(df$X12CO2) )

meanCO2 <- ( CumCO2[m] - CumCO2[n] ) / (m-n)

.

> n
 [1]  1  1  1  2  3  3  5  8  9 10 11 12 13 14 15 16 17 18 19 20
> m
 [1]  4  5  7  7  8  8  8  9 10 11 12 13 14 15 16 17 18 19 20 21
> meanCO2
 [1] 399.9350 399.9593 399.9835 399.9932 399.9606 399.9606 399.9453 399.7718 399.7718 399.3855 399.3855 399.3855 399.4708 399.4708 399.4708 400.0362
[17] 400.0362 400.0362 399.7557 399.7557
> 

Man you beat me to it with a cleaner solution mra68.

Here's mine using a few loops.

####################
for (j in 1:nrow(DF)){#Loop through all rows of your dataset

  CO2list<-NULL ##Need to make a variable before storing to it in the loop

  for(i in 1:nrow(DF)){##Loop through all distances in the table

      if ((abs(DF$CumDist[i]-DF$CumDist[j]))<=0.001) {
        ##Check to see if difference in CumDist<=100/100000 for all entries 
        #CumDist[j] is point with the 100 meter window around it
      CO2list<-c(CO2list,DF$X12CO2_dry[i])
        ##Store your CO2 entries that are within the 100 meter window to a vector
      }

  }
  DF$CO2AVG[j]<-mean(CO2list) 
     #Get the mean of your list and store it to column named CO2AVG

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM