简体   繁体   English

将栅格文件(4 维)转换为允许进行随机森林分类的结构

[英]Turn raster files (4-dimensional) into structure that allows to conduct a random forest classification

My goal is to conduct a random forest classification for agricultural landuse forms (crop classification).我的目标是对农业用地 forms(作物分类)进行随机森林分类。 I have several ground truth points for all classes.我对所有课程都有几个基本事实点。 Furthermore, I have 37 raster files (.tif) each having the same 12 bands and same extent, with one file representing one date in the time series.此外,我有 37 个栅格文件 (.tif),每个文件都具有相同的 12 个波段和相同的范围,一个文件代表时间序列中的一个日期。 The time series is NOT constant.时间序列不是恒定的。

The following shows the files, the dates and band names plus and first file read with terra:下面显示了文件、日期和乐队名称以及用 terra 读取的第一个文件:

> files <- list.files("C:/temp/final2",full.names = T,pattern = ".tif$",recursive = T)
> files[1:3]
[1] "C:/temp/final2/20190322T100031_20190322T100524_T33UXP.tif" "C:/temp/final2/20190324T095029_20190324T095522_T33UXP.tif"
[3] "C:/temp/final2/20190329T095031_20190329T095315_T33UXP.tif"

> dates <- as.Date(substr(basename(files),1,8),"%Y%m%d")
> band_names <- c("B02","B03","B04","B05","B08","B11","B12","NDVI","NDWI","SAVI")
 
> rast(files[1])
class       : SpatRaster 
dimensions  : 386, 695, 12  (nrow, ncol, nlyr)
resolution  : 10, 10  (x, y)
extent      : 634500, 641450, 5342460, 5346320  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 33N (EPSG:32633) 
source      : 20190322T100031_20190322T100524_T33UXP.tif 
names       : B2, B3, B4, B5, B6, B7, ... 

I want to extract the value for every date and band.我想提取每个日期和波段的值。 This should result in a dataframe with oberserved variables and the respective class for each point (see below).这应该导致 dataframe 与观察到的变量和相应的 class 每个点(见下文)。 With this dataframe I want to train a random forest model in order to predict the crop classes for each raster (resulting in a single raster layer with classes as values).有了这个 dataframe 我想训练一个随机森林 model 以预测每个栅格的作物类别(产生一个以类别为值的栅格图层)。

The following structure (copied from https://gdalcubes.github.io/source/tutorials/vi.nettes/gc03_ML_training_data.html ) is what I need as observed values, which serve as the training data for the rf model.以下结构(从https://gdalcubes.github.io/source/tutorials/vi.nettes/gc03_ML_training_data.html复制)是我需要的观测值,它作为 rf model 的训练数据。

##    FID       time    B2  ... more bands ... and class of respective FID
## 1   16 2018-01-01 13.33 
## 2   17 2018-01-01 13.63
## 3   18 2018-01-01 13.33
## 4   19 2018-01-01 12.15
## 5   20 2018-01-01 14.73
## 6   21 2018-01-01 15.91
## 7   16 2018-01-09 12.23
## 8   17 2018-01-09 12.15
## 9   18 2018-01-09 12.07
## 10  19 2018-01-09 10.19
## 11  20 2018-01-09  9.83

I (1) read all the rasters into list called 'cube' and我(1)将所有栅格读入名为“立方体”的列表中,然后
(2) combined all the spatRasters in the list into one spatRaster. (2) 将列表中的所有spatRaster合并为一个spatRaster。

> cube <- c()
> for (file in files){
+   ras <- rast(file)
+   cube<-c(cube,ras)
+ }
> names(cube) <- dates
> cubef <- rast(cube)
> cubef
class       : SpatRaster 
dimensions  : 386, 695, 444  (nrow, ncol, nlyr)
resolution  : 10, 10  (x, y)
extent      : 634500, 641450, 5342460, 5346320  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 33N (EPSG:32633) 
sources     : 20190322T100031_20190322T100524_T33UXP.tif  (12 layers) 
              20190324T095029_20190324T095522_T33UXP.tif  (12 layers) 
              20190329T095031_20190329T095315_T33UXP.tif  (12 layers) 
              ... and 34 more source(s)
names       : 2019-03-22_1, 2019-03-22_2, 2019-03-22_3, 2019-03-22_4, 2019-03-22_5, 2019-03-22_6, ... 

When I extract the values of all the layers for sample points, I get the follwing result.当我提取样本点的所有层的值时,我得到以下结果。

> s_points <- st_read(connex,query="SELECT * FROM s_points WHERE NOT ST_IsEmpty(geom);")

> str(s_points)
Classes ‘sf’ and 'data.frame':  286 obs. of  3 variables:
 $ s_point_id: int  1 1 2 2 4 4 6 6 7 7 ...
 $ kf_klasse : chr  "ERBSEN - GETREIDE GEMENGE" "ERBSEN - GETREIDE GEMENGE" "ERBSEN - GETREIDE GEMENGE" "ERBSEN - GETREIDE GEMENGE" ...
 $ geom      :sfc_POINT of length 286; first list element:  'XY' num  637052 5345218
 - attr(*, "sf_column")= chr "geom"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
  ..- attr(*, "names")= chr [1:2] "s_point_id" "kf_klasse"

> s_points_coords <- st_coordinates(s_points)

> e <- terra::extract(cubef, s_points)

> str(e)
'data.frame':   286 obs. of  445 variables:
 $ ID           : num  1 1 2 2 3 3 4 4 5 5 ...
 $ 2019-03-22_1 : num  0.0789 0.0901 0.0587 0.063 0.0937 0.0901 0.0517 0.0528 0.0819 0.0882 ...
 $ 2019-03-22_2 : num  0.096 0.1056 0.0728 0.0771 0.1072 ...
 $ 2019-03-22_3 : num  0.108 0.1226 0.0734 0.0788 0.125 ...
 $ 2019-03-22_4 : num  0.1301 0.1437 0.0998 0.1017 0.1395 ...
 $ 2019-03-22_5 : num  0.166 0.174 0.157 0.151 0.156 ...
 $ 2019-03-22_6 : num  0.183 0.188 0.174 0.163 0.169 ...
 $ 2019-03-22_7 : num  0.196 0.196 0.183 0.169 0.186 ...
 $ 2019-03-22_8 : num  0.27 0.293 0.171 0.172 0.282 ...
 $ 2019-03-22_9 : num  0.236 0.269 0.138 0.142 0.252 ...
 $ 2019-03-22_10: num  0.29 0.229 0.427 0.365 0.196 ...
 $ 2019-03-22_11: num  -0.343 -0.299 -0.43 -0.374 -0.268 ...
 $ 2019-03-22_12: num  0.1353 0.1108 0.1739 0.1452 0.0928 ...
 $ 2019-03-24_1 : num  0.099 0.1088 0.0919 NA 0.1058 ...
 $ 2019-03-24_2 : num  0.111 0.115 0.11 NA 0.114 ...
 $ 2019-03-24_3 : num  0.116 0.127 0.104 NA 0.131 ...
 $ 2019-03-24_4 : num  0.145 0.154 0.147 NA 0.152 ...
 $ 2019-03-24_5 : num  0.19 0.19 0.258 NA 0.171 ...
 $ 2019-03-24_6 : num  0.208 0.21 0.294 NA 0.186 ...
 $ 2019-03-24_7 : num  0.231 0.222 0.31 NA 0.197 ...
 $ 2019-03-24_8 : num  0.318 0.341 0.281 NA 0.331 ...
 $ 2019-03-24_9 : num  0.283 0.314 0.217 NA 0.305 ...
 $ 2019-03-24_10: num  0.329 0.271 0.497 NA 0.202 ...
 $ 2019-03-24_11: num  -0.35 -0.317 -0.477 NA -0.268 ...
 $ 2019-03-24_12: num  0.1698 0.1405 0.291 NA 0.0997 ...
 $ 2019-03-29_1 : num  NA NA 0.0476 NA 0.0891 0.0847 0.0664 0.0719 NA NA ...
 $ 2019-03-29_2 : num  NA NA 0.0642 NA 0.0965 ...
 $ 2019-03-29_3 : num  NA NA 0.0607 NA 0.1196 ...
 $ 2019-03-29_4 : num  NA NA 0.0904 NA 0.1351 ...
 $ 2019-03-29_5 : num  NA NA 0.162 NA 0.149 ...
 $ 2019-03-29_6 : num  NA NA 0.18 NA 0.167 ...
 $ 2019-03-29_7 : num  NA NA 0.182 NA 0.183 ...
 $ 2019-03-29_8 : num  NA NA 0.167 NA 0.337 ...
 $ 2019-03-29_9 : num  NA NA 0.125 NA 0.311 ...
 $ 2019-03-29_10: num  NA NA 0.5 NA 0.209 ...
 $ 2019-03-29_11: num  NA NA -0.479 NA -0.309 ...
 $ 2019-03-29_12: num  NA NA 0.1955 NA 0.0971 ...
 $ 2019-04-01_1 : num  0.0616 0.0703 0.0543 0.0573 0.0733 0.0783 0.0675 0.0693 0.0557 0.0584 ...
 $ 2019-04-01_2 : num  0.0742 0.0838 0.073 0.076 0.0849 0.0872 0.0783 0.0821 0.0733 0.073 ...
 $ 2019-04-01_3 : num  0.0798 0.0945 0.066 0.0758 0.0987 ...
 $ 2019-04-01_4 : num  0.101 0.114 0.104 0.106 0.116 ...
 $ 2019-04-01_5 : num  0.144 0.143 0.205 0.188 0.129 ...
 $ 2019-04-01_6 : num  0.157 0.157 0.231 0.209 0.143 ...
 $ 2019-04-01_7 : num  0.17 0.165 0.249 0.214 0.153 ...
 $ 2019-04-01_8 : num  0.24 0.259 0.208 0.212 0.275 ...
 $ 2019-04-01_9 : num  0.207 0.232 0.152 0.168 0.256 ...
 $ 2019-04-01_10: num  0.362 0.272 0.581 0.476 0.216 ...
 $ 2019-04-01_11: num  -0.393 -0.326 -0.547 -0.475 -0.287 ...
 $ 2019-04-01_12: num  0.1449 0.1119 0.2783 0.2137 0.0871 ...
 $ 2019-04-16_1 : num  0.0639 0.0695 0.0539 0.0541 0.0767 0.081 0.0754 0.0739 0.0606 0.0621 ...
 $ 2019-04-16_2 : num  0.0733 0.0797 0.0717 0.07 0.0834 0.0862 0.0835 0.0854 0.0748 0.0785 ...
 $ 2019-04-16_3 : num  0.0832 0.0923 0.0658 0.0626 0.1042 ...
 $ 2019-04-16_4 : num  0.108 0.115 0.111 0.107 0.118 ...
 $ 2019-04-16_5 : num  0.164 0.159 0.229 0.223 0.136 ...
 $ 2019-04-16_6 : num  0.183 0.179 0.26 0.26 0.149 ...
 $ 2019-04-16_7 : num  0.202 0.198 0.284 0.275 0.166 ...
 $ 2019-04-16_8 : num  0.255 0.27 0.205 0.202 0.288 ...
 $ 2019-04-16_9 : num  0.219 0.244 0.141 0.144 0.278 ...
 $ 2019-04-16_10: num  0.416 0.364 0.623 0.63 0.23 ...
 $ 2019-04-16_11: num  -0.467 -0.426 -0.596 -0.595 -0.332 ...
 $ 2019-04-16_12: num  0.1846 0.1638 0.3228 0.3181 0.0979 ...
 $ 2019-04-18_1 : num  0.0702 0.0792 0.0636 0.063 0.0875 0.094 0.0858 0.0868 0.0662 0.0709 ...
 $ 2019-04-18_2 : num  0.0838 0.0946 0.0898 0.0872 0.101 ...
 $ 2019-04-18_3 : num  0.0908 0.1038 0.0785 0.0765 0.1206 ...
 $ 2019-04-18_4 : num  0.121 0.13 0.13 0.125 0.138 ...
 $ 2019-04-18_5 : num  0.186 0.183 0.266 0.253 0.154 ...
 $ 2019-04-18_6 : num  0.213 0.205 0.299 0.289 0.167 ...
 $ 2019-04-18_7 : num  0.221 0.214 0.312 0.297 0.186 ...
 $ 2019-04-18_8 : num  0.275 0.294 0.228 0.228 0.314 ...
 $ 2019-04-18_9 : num  0.227 0.255 0.154 0.157 0.296 ...
 $ 2019-04-18_10: num  0.418 0.346 0.598 0.59 0.214 ...
 $ 2019-04-18_11: num  -0.45 -0.387 -0.553 -0.546 -0.297 ...
 $ 2019-04-18_12: num  0.199 0.167 0.335 0.321 0.101 ...
 $ 2019-04-21_1 : num  0.0404 0.0619 0.0373 0.0351 0.0814 0.0844 0.0764 0.0801 0.0563 0.0626 ...
 $ 2019-04-21_2 : num  0.0592 0.0823 0.0614 0.0579 0.0927 0.0966 0.0933 0.0952 0.0776 0.0869 ...
 $ 2019-04-21_3 : num  0.0542 0.0873 0.048 0.0433 0.1118 ...
 $ 2019-04-21_4 : num  0.082 0.105 0.0933 0.0841 0.1279 ...
 $ 2019-04-21_5 : num  0.15 0.163 0.225 0.207 0.144 ...
 $ 2019-04-21_6 : num  0.173 0.184 0.259 0.247 0.155 ...
 $ 2019-04-21_7 : num  0.174 0.199 0.274 0.251 0.172 ...
 $ 2019-04-21_8 : num  0.192 0.237 0.168 0.156 0.291 ...
 $ 2019-04-21_9 : num  0.1352 0.1804 0.0994 0.0903 0.2674 ...
 $ 2019-04-21_10: num  0.525 0.391 0.702 0.706 0.213 ...
 $ 2019-04-21_11: num  -0.493 -0.415 -0.634 -0.625 -0.3 ...
 $ 2019-04-21_12: num  0.1954 0.174 0.3422 0.3212 0.0941 ...
 $ 2019-05-01_1 : num  0.0342 0.0435 0.0282 0.0292 0.07 0.0684 0.0722 0.0757 0.0458 0.061 ...
 $ 2019-05-01_2 : num  0.0516 0.055 0.0517 0.048 0.0781 0.0793 0.0861 0.0919 0.0613 0.0839 ...
 $ 2019-05-01_3 : num  0.0422 0.0538 0.0299 0.0325 0.0991 ...
 $ 2019-05-01_4 : num  0.0753 0.0836 0.0761 0.0755 0.1112 ...
 $ 2019-05-01_5 : num  0.182 0.177 0.247 0.235 0.124 ...
 $ 2019-05-01_6 : num  0.21 0.203 0.3 0.287 0.138 ...
 $ 2019-05-01_7 : num  0.214 0.19 0.314 0.293 0.157 ...
 $ 2019-05-01_8 : num  0.164 0.182 0.148 0.146 0.264 ...
 $ 2019-05-01_9 : num  0.0988 0.1156 0.0777 0.0763 0.235 ...
 $ 2019-05-01_10: num  0.67 0.559 0.826 0.801 0.225 ...
 $ 2019-05-01_11: num  -0.611 -0.552 -0.717 -0.719 -0.334 ...
 $ 2019-05-01_12: num  0.273 0.2196 0.4226 0.3935 0.0916 ...
 $ 2019-05-26_1 : num  0.0537 0.0633 0.0431 0.0444 0.118 ...
 $ 2019-05-26_2 : num  0.0675 0.0835 0.0611 0.0564 0.1284 ...
  [list output truncated]

What I have now is a dataframe, that has a column for every band of each image (12 columns for each image), which results in 37x12 columns.我现在拥有的是 dataframe,每个图像的每个波段都有一列(每个图像有 12 列),结果为 37x12 列。 From here on, I don't know how to add the extracted values to the s_points dataframe, in order to have the ID and classname of the extracted values.从这里开始,我不知道如何将提取的值添加到 s_points dataframe,以便获得提取值的 ID 和类名。 This isn't possible, because I have 444 values for every point.这是不可能的,因为每个点都有 444 个值。

My questions are:我的问题是:

  1. How can I combine the extracted values and the sample_points?如何组合提取的值和 sample_points?
  2. How can I train a rf-model with this extracted data?如何使用提取的数据训练射频模型?
  3. Does it make more sense to use a datacube here (gdalcubes in R)?在这里使用数据立方体(R 中的 gdalcubes)是否更有意义? I forget this idea, mainly because of the unconstant character of the time series, which would result in problem with the temporal aggregation.我忘记了这个想法,主要是因为时间序列的不确定性,这会导致时间聚合出现问题。 This isn't expedient in the research question.这在研究问题上不是权宜之计。 Thanks谢谢

You mention that you want a dataset with four dimensions.您提到您想要一个具有四个维度的数据集。 But how you are going to train your model and make predictions (you can only use two dimensions for that)?但是你将如何训练你的 model 并做出预测(你只能使用二维)? So it would seem to me that what you need is a three-dimensional SpatRaster that you can make with所以在我看来,你需要的是一个三维的 SpatRaster,你可以用

cube <- rast(files)

Unless you want to run a separate model for each file --- but then you should loop over the files.除非你想为每个文件运行一个单独的 model --- 但是你应该遍历这些文件。

Here is an example (taken from ?terra::predict showing how you might then run a RandomForest, or any other regression or classification model.这是一个示例(取自?terra::predict ,显示您随后如何运行 RandomForest,或任何其他回归或分类 model。

library(terra)
logo1 <- rast(system.file("ex/logo.tif", package="terra"))   
logo2 <- sqrt(logo1)
cube <- c(logo1, logo2)

names(cube) <- c("red1", "green1", "blue1", "red2", "green2", "blue2")

p <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85, 
   66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31, 
   22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
a <- matrix(c(22, 33, 64, 85, 92, 94, 59, 27, 30, 64, 60, 33, 31, 9,
   99, 67, 15, 5, 4, 30, 8, 37, 42, 27, 19, 69, 60, 73, 3, 5, 21,
   37, 52, 70, 74, 9, 13, 4, 17, 47), ncol=2)

xy <- rbind(cbind(1, p), cbind(0, a))
e <- extract(cube, xy[,2:3])
v <- data.frame(cbind(pa=xy[,1], e))

library(randomForest)
rfm <- randomForest(formula=pa~., data=v)
p <- predict(cube, rfm)

Perhaps you can edit your question and explain why this would not work for you.也许您可以编辑您的问题并解释为什么这对您不起作用。 And include a toy example of how you intend to fit your model. I suppose the rasters are your predictors, but what are you predicting (your y variable)?并包括一个玩具示例,说明您打算如何适合您的 model。我想栅格是您的预测变量,但您在预测什么(您的y变量)? Is it constant or is it different for each time step (raster file)?它是恒定的还是每个时间步长(光栅文件)不同?

If the issue is that you want to distinguish between variables with the same names at different dates you can concatenate them.如果问题是您想区分在不同日期具有相同名称的变量,您可以将它们连接起来。 Something like this with SpatRaster x类似 SpatRaster x的东西

names(x) <- paste0(names(x), "_", time(x))

If you want to write a single.netCDF file you could do如果你想写一个 single.netCDF 文件,你可以这样做

sds <- rast(files)
writeCDF(sds, "test.nc")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM