根據其他兩列（緯度、經度）中的值添加唯一 ID 列

Question

有人問過這個問題，但我正在尋找更完整的答案/稍作修改的 output。

我在單獨的列中有一個包含 Lat 和 Long 值的數據集，並希望為 Lat 和 Long 的每個唯一組合創建一個唯一 ID。

我將從一篇較早的帖子中借用一個示例數據集，提出相同的問題，但我需要一個稍微不同的解決方案（按組添加 ID 列）。

d <- read.table(text='LAT LONG
13.5330 -15.4180 
13.5330 -15.4180 
13.5330 -15.4180 
13.5330 -15.4180 
13.5330 -15.4170 
13.5330 -15.4170 
13.5330 -15.4170 
13.5340 -14.9350 
13.5340 -14.9350 
13.5340 -15.9170 
13.3670 -14.6190', header=TRUE)

給出的解決方案是：

d <- transform(d, Cluster_ID = as.numeric(interaction(LAT, LONG, drop=TRUE)))

#       LAT    LONG Cluster_ID
# 1  13.533 -15.418          2
# 2  13.533 -15.418          2
# 3  13.533 -15.418          2
# 4  13.533 -15.418          2
# 5  13.533 -15.417          3
# 6  13.533 -15.417          3
# 7  13.533 -15.417          3
# 8  13.534 -14.935          4
# 9  13.534 -14.935          4
# 10 13.534 -15.917          1
# 11 13.367 -14.619          5

但是如何讓interaction命令保持順序，以便上面的第一個 Cluster_ID 為 1（最后一列的完整向量將是 1,1,1,1,2,2,2,3,3,4,5 2,2,2,2,3,3,4,4,1,5)？ 目前尚不清楚如何確定新的因子順序（轉換為數字）。

I have also been trying to find equivalent way of doing this using group_by in dplyr but can't figure out how to output the tibble table as a dataframe (older solutions on SO seem to use depreciated dplyr commands).

謝謝！

Answer 1

我們可以使用match

transform(d, Cluster_ID = match(paste0(LAT, LONG), unique(paste0(LAT, LONG))))

或將“LAT”、“LONG”轉換為序列，然后進行interaction

transform(d, Cluster_ID = as.integer(interaction(match(LAT, 
  unique(LAT)),  match(LONG, unique(LONG)), drop=TRUE, lex.order = FALSE)))

Answer 2

使用.GRP的data.table選項

> setDT(d)[, Cluster_ID := .GRP, .(LAT, LONG)][]
       LAT    LONG Cluster_ID
 1: 13.533 -15.418          1
 2: 13.533 -15.418          1
 3: 13.533 -15.418          1
 4: 13.533 -15.418          1
 5: 13.533 -15.417          2
 6: 13.533 -15.417          2
 7: 13.533 -15.417          2
 8: 13.534 -14.935          3
 9: 13.534 -14.935          3
10: 13.534 -15.917          4
11: 13.367 -14.619          5

或rleid （感謝@akrun 的評論）

> setDT(d)[, Cluster_ID := rleid(LAT, LONG)][]
       LAT    LONG Cluster_ID
 1: 13.533 -15.418          1
 2: 13.533 -15.418          1
 3: 13.533 -15.418          1
 4: 13.533 -15.418          1
 5: 13.533 -15.417          2
 6: 13.533 -15.417          2
 7: 13.533 -15.417          2
 8: 13.534 -14.935          3
 9: 13.534 -14.935          3
10: 13.534 -15.917          4
11: 13.367 -14.619          5

或使用ave + cumsum的基本 R 選項

transform(
  d,
  Cluster_ID = cumsum(
    ave(1:nrow(d),
      LAT,
      LONG,
      FUN = seq_along
    ) == 1
  )
)

根據其他兩列（緯度、經度）中的值添加唯一 ID 列

問題描述

2 個解決方案

解決方案1
2 已采納 2021-05-26 20:33:10

解決方案2
1 2021-05-26 20:36:06

根據其他兩列（緯度、經度）中的值添加唯一 ID 列

問題描述

2 個解決方案

解決方案1 2 已采納 2021-05-26 20:33:10

解決方案2 1 2021-05-26 20:36:06

解決方案1
2 已采納 2021-05-26 20:33:10

解決方案2
1 2021-05-26 20:36:06