简体   繁体   English

具有重复和缺失点的空间数据

[英]Spatial data with duplicates and missing points

I am analysing data from an egg survey. 我正在分析鸡蛋调查中的数据。 Data is available from different points in the North Sea, some stations are recorded double at different dates. 可以从北海的不同地点获得数据,某些站点在不同的日期被记录为两次。 The sea should be covered by 0.5 x 0.5 degree squares. 海面应覆盖0.5 x 0.5度的正方形。 I have two questions for which I couldn't find any solution yet: 我有两个问题,但我找不到任何解决方案:

  1. How do I replace the points with duplicated locations and different dates with a mean value? 如何用重复的位置替换点,并用平均值替换不同的日期? I know how to remove duplicates or how to replace them by max or min but couldn't find a way how to calculate a mean. 我知道如何删除重复项或如何用max或min代替它们,但是找不到找到平均值的方法。

  2. How do I calculate interpolated values for the missing points, based on neighbouring cells. 如何根据相邻像元计算缺失点的插值。 Interpolated values should be calculated as long and only if at least two recorded points are neighbouring. 仅当至少两个记录点相邻时,才应计算插值的长度。

I tried with setting a grid, but did not come very far as I couldn't find a way how to tell R when to interpolate and when not. 我尝试设置网格,但没有走很远,因为我找不到方法告诉R何时插入和何时不插入。

Sample data: 样本数据:

egg_data <- structure(list(Latitude = c(54.25, 54.25, 54.25, 54.25, 54.25, 
54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 54.25, 
55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 
55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 
55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 55.25, 
55.25, 55.25, 55.25, 54.25, 54.25, 54.25, 53.25, 58.25, 57.75, 
57.25, 57.25, 57.25, 57.25, 57.25, 57.25, 57.25, 57.25, 56.75, 
56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 56.75, 
56.75, 56.75, 56.75, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 
56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 56.25, 
56.25, 56.75, 56.75, 56.75), Longitude = c(6.25, 5.25, 5.25, 
4.25, 4.25, 3.25, 3.25, 2.25, 2.25, 1.25, 1.25, 0.25, 0.25, 0.25, 
0.25, 0.25, 0.25, 0.25, 1.25, 1.25, 2.25, 2.25, 3.25, 3.25, 4.25, 
4.25, 5.25, 5.25, 5.25, 5.25, 4.25, 4.25, 3.25, 3.25, 2.25, 2.25, 
1.25, 1.25, 0.25, 0.25, 0.25, 0.25, 1.25, 1.25, 0.25, 0.25, 0.25, 
0.25, 3.25, 3.25, 3.25, 2.75, 2.25, 1.75, 1.25, 0.75, 0.25, 0.25, 
0.25, 0.25, 0.75, 1.25, 1.75, 2.25, 2.75, 3.25, 3.75, 4.25, 4.75, 
5.25, 5.75, 6.25, 5.75, 5.25, 4.75, 4.25, 3.75, 3.25, 2.25, 1.75, 
1.25, 0.75, 0.25, 0.25, 0.75, 1.25, 1.75, 1.75, 1.25, 0.75), 
    Eggs = c(9L, 6L, 4L, 20L, 57L, 14L, 35L, 18L, 4L, 1L, 3L, 
    100L, 1L, 201L, 0L, 51L, 52L, 23L, 19L, 4L, 5L, 23L, 11L, 
    18L, 7L, 7L, 14L, 6L, 3L, 4L, 20L, 13L, 19L, 5L, 16L, 23L, 
    28L, 11L, 9L, 12L, 19L, 62L, 6L, 3L, 15L, 110L, 57L, 0L, 
    14L, 3L, 3L, 8L, 94L, 62L, 7L, 19L, 511L, 59L, 283L, 308L, 
    20L, 44L, 61L, 24L, 10L, 10L, 15L, 6L, 8L, 12L, 32L, 2L, 
    5L, 10L, 21L, 4L, 1L, 19L, 3L, 4L, 4L, 17L, 51L, 108L, 1213L, 
    132L, 4L, 0L, 0L, 0L)), .Names = c("Latitude", "Longitude", 
"Eggs"), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", 
"60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", 
"71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81", 
"82", "83", "84", "85", "86", "87", "88", "89", "90"))

Thank you very much!! 非常感谢你!!

Add a factor for each location 为每个位置添加一个因子

egg_data <- within(egg_data, Location <- paste("(", Latitude, ", ", Longitude, ")", sep = "") ) egg_data <-内(egg_data,位置<-paste(“(”,纬度,“,”,经度,“)”,sep =“”)))

EDIT: There's no point in being fancy about this, since we want to reverse the process shortly. 编辑:没有必要对此花哨,因为我们想尽快撤消该过程。

egg_data <- within(egg_data, 
  Location <- paste(Latitude, Longitude, sep = ",")
)

Then there are loads of ways of getting the mean. 然后有很多求平均值的方法。

means_by_location <- with(egg_data, tapply(Eggs, Location, mean))

or 要么

library(plyr)
means_by_location2 <- ddply(egg_data, .(Location), summarise, Mean.eggs = mean(Eggs))

or 要么

means_by_location3 <- aggregate(Eggs ~ Location, egg_data, mean)

or 要么

means_by_location4 <- with(egg_data, by(Eggs, Location, mean))

EDIT: For the next bit, you want to hav the result in a data frame, so use method 2 or 3. 编辑:对于下一位,您想在一个数据帧中保存结果,因此请使用方法2或3。

Add the latitude and longitude back in to your new dataset. 将纬度和经度重新添加到新数据集中。 (Lots of ways of doing this.) (很多方法。)

lat_long <- strsplit(means_by_location2$Location, ",")
means_by_location2$Latitude <- sapply(lat_long, function(x) x[1]) 
means_by_location2$Longitude <- sapply(lat_long, function(x) x[2])

This is your first question answered. 这是您回答的第一个问题。


For the second question, you need to think a bit more. 对于第二个问题,您需要考虑更多。 Take a look a plot of eggs by location. 看一下鸡蛋的位置。

library(ggplot2)
(p <- ggplot(means_by_location2, aes(Longitude, Latitude, colour = log10(Mean.eggs  +1))) +
  geom_point() +
  scale_colour_gradient(low = "#FFFFFF", high = "#0000FF", space = "Lab")
)

Are you interpolating north to south, or east to west, or with all neighbouring points? 您是在从北向南,从东向西或与所有相邻点进行插值吗? There are lots of different possibilities and they may have different answers. 有很多不同的可能性,他们可能有不同的答案。 It's a nontrivial task to say which interpolation is best. 要确定哪种插值法是最好的,这是一项艰巨的任务。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM