匯總分類變量的比例並為每組分配主要分類變量

Question

我有一組來自 map 的示例圖，其中包含多邊形特征（A、B 和 C）和森林類型柵格（冷、暖和熱）。

Plot    Polygon Forest
1       A       Cold
2       A       Cold
3       A       Cold
4       A       Warm
5       B       Cold
6       B       Cold
7       C       Cold
8       C       Warm
9       C       Hot
10      C       Hot

我想按多邊形總結每種森林類型的比例，並確定每個多邊形中的主要森林類型。 例如：

Polygon Cold    Warm    Hot   Forest_dominant
A       0.75    0.25    0     Cold
B       1       0       0     Cold
C       0.25    0.25    0.5   Hot

Answer 1

這有點令人費解，但也許：

library(tidyverse)

df <- structure(list(Plot = 1:10, Polygon = c("A", "A", "A", "A", "B", 
                                              "B", "C", "C", "C", "C"), Forest = c("Cold", "Cold", "Cold", 
                                                                                   "Warm", "Cold", "Cold", "Cold", "Warm", "Hot", "Hot")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                               -10L))
df %>%
  group_by(Polygon, Forest) %>%
  summarise(n = n()) %>%
  mutate(n = n / sum(n)) %>%
  group_by(Polygon) %>%
  arrange(Polygon, -n) %>%
  mutate(Forest_dominant = first(Forest)) %>%
  pivot_wider(names_from = Forest, values_from = n, values_fill = 0) %>%
  relocate(Forest_dominant, .after = last_col())
#> `summarise()` has grouped output by 'Polygon'. You can override using the `.groups` argument.
#> # A tibble: 3 × 5
#> # Groups:   Polygon [3]
#>   Polygon  Cold  Warm   Hot Forest_dominant
#>   <chr>   <dbl> <dbl> <dbl> <chr>          
#> 1 A        0.75  0.25   0   Cold           
#> 2 B        1     0      0   Cold           
#> 3 C        0.25  0.25   0.5 Hot

^{由代表 package (v2.0.1) 於 2021 年 12 月 22 日創建}

Answer 2

我們首先可以計算每個組內的比例，然后pivot_wider

library(dplyr)
library(tidyr)

df %>%
  group_by(Polygon, Forest) %>% 
  summarise(n = n()) %>% 
  mutate(proportion = n/ sum(n),  
         Forest_dominant = max(proportion), .keep="unused") %>% 
  pivot_wider(
    names_from = Forest,
    values_from = proportion,
    values_fill = 0
  )

   Polygon Forest_dominant  Cold  Warm   Hot
  <chr>   <chr>           <dbl> <dbl> <dbl>
1 A       Cold             0.75  0.25   0  
2 B       Cold             1     0      0  
3 C       Hot              0.25  0.25   0.5

Answer 3

一個基礎 R 選項

reshape(
  unique(
    transform(
      df,
      prop = ave(Forest, Polygon, FUN = function(x) table(x)[x] / length(x)),
      Forest_dominant = ave(Forest, Polygon, FUN = function(x) names(which.max(table(x))))
    )
  ),
  direction = "wide",
  idvar = c("Polygon", "Forest_dominant"),
  timevar = "Forest"
)

給

  Polygon Forest_dominant prop.Cold prop.Warm prop.Hot
1       A            Cold      0.75      0.25     <NA>
5       B            Cold         1      <NA>     <NA>
7       C             Hot      0.25      0.25      0.5

或data.table選項

dcast(
  setDT(df)[
    ,
    .(cnt = .N), .(Polygon, Forest)
  ][
    ,
    `:=`(prop = proportions(cnt), Forest_dominant = Forest[which.max(cnt)]),
    Polygon
  ],
  Polygon + Forest_dominant ~ Forest,
  value.var = "prop",
  fill = 0
)

給

   Polygon Forest_dominant Cold Hot Warm
1:       A            Cold 0.75 0.0 0.25
2:       B            Cold 1.00 0.0 0.00
3:       C             Hot 0.25 0.5 0.25

數據

> dput(df)
structure(list(Polygon = c("A", "A", "A", "A", "B", "B", "C", 
"C", "C", "C"), Forest = c("Cold", "Cold", "Cold", "Warm", "Cold",
"Cold", "Cold", "Warm", "Hot", "Hot")), row.names = c(NA, -10L
), class = "data.frame")

Answer 4

使用proportions和which.max 。

with(dat, {
  p <- unclass(proportions(table(Polygon, Forest), margin=1))
  cbind.data.frame(p, Forest_dominant=colnames(p)[apply(p, 1, which.max)])
})
#   Cold Hot Warm Forest_dominant
# A 0.75 0.0 0.25            Cold
# B 1.00 0.0 0.00            Cold
# C 0.25 0.5 0.25             Hot

如果您需要"Polygons"作為列，則在cbind.data.frame中包含, Polygon=rownames(p) 。

Answer 5

library(dplyr, warn.conflicts = FALSE)

df %>% 
  group_by(Polygon) %>% 
  summarise({
    prop.table(table(Forest)) %>% 
      as.list %>% as_tibble
  }) %>% 
  mutate(
    across(-1, coalesce, 0),
    Forest_dominant = across(-1) %>% {names(.)[max.col(.)]}
  )
#> # A tibble: 3 × 5
#>   Polygon  Cold  Warm   Hot Forest_dominant
#>   <chr>   <dbl> <dbl> <dbl> <chr>          
#> 1 A        0.75  0.25   0   Cold           
#> 2 B        1     0      0   Cold           
#> 3 C        0.25  0.25   0.5 Hot

^{由代表 package (v2.0.1) 於 2021 年 12 月 21 日創建}

匯總分類變量的比例並為每組分配主要分類變量

問題描述

5 個解決方案

解決方案1
2 2021-12-21 23:27:44

解決方案2
2 2021-12-21 23:30:06

解決方案3
1 2021-12-21 23:30:41

數據

解決方案4
1 2021-12-22 09:02:04

解決方案5
0 2021-12-22 01:30:44

匯總分類變量的比例並為每組分配主要分類變量

問題描述

5 個解決方案

解決方案1 2 2021-12-21 23:27:44

解決方案2 2 2021-12-21 23:30:06

解決方案3 1 2021-12-21 23:30:41

數據

解決方案4 1 2021-12-22 09:02:04

解決方案5 0 2021-12-22 01:30:44

解決方案1
2 2021-12-21 23:27:44

解決方案2
2 2021-12-21 23:30:06

解決方案3
1 2021-12-21 23:30:41

解決方案4
1 2021-12-22 09:02:04

解決方案5
0 2021-12-22 01:30:44