![](/img/trans.png)
[英]Trying to keep values of a column based on the unique values of two other columns
[英]Add unique ID column based on values in two other columns (lat, long)
有人問過這個問題,但我正在尋找更完整的答案/稍作修改的 output。
我在單獨的列中有一個包含 Lat 和 Long 值的數據集,並希望為 Lat 和 Long 的每個唯一組合創建一個唯一 ID。
我將從一篇較早的帖子中借用一個示例數據集,提出相同的問題,但我需要一個稍微不同的解決方案( 按組添加 ID 列)。
d <- read.table(text='LAT LONG
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4170
13.5330 -15.4170
13.5330 -15.4170
13.5340 -14.9350
13.5340 -14.9350
13.5340 -15.9170
13.3670 -14.6190', header=TRUE)
給出的解決方案是:
d <- transform(d, Cluster_ID = as.numeric(interaction(LAT, LONG, drop=TRUE)))
# LAT LONG Cluster_ID
# 1 13.533 -15.418 2
# 2 13.533 -15.418 2
# 3 13.533 -15.418 2
# 4 13.533 -15.418 2
# 5 13.533 -15.417 3
# 6 13.533 -15.417 3
# 7 13.533 -15.417 3
# 8 13.534 -14.935 4
# 9 13.534 -14.935 4
# 10 13.534 -15.917 1
# 11 13.367 -14.619 5
但是如何讓interaction
命令保持順序,以便上面的第一個 Cluster_ID 為 1(最后一列的完整向量將是 1,1,1,1,2,2,2,3,3,4,5 2,2,2,2,3,3,4,4,1,5)? 目前尚不清楚如何確定新的因子順序(轉換為數字)。
I have also been trying to find equivalent way of doing this using group_by
in dplyr
but can't figure out how to output the tibble table as a dataframe (older solutions on SO seem to use depreciated dplyr commands).
謝謝!
我們可以使用match
transform(d, Cluster_ID = match(paste0(LAT, LONG), unique(paste0(LAT, LONG))))
或將“LAT”、“LONG”轉換為序列,然后進行interaction
transform(d, Cluster_ID = as.integer(interaction(match(LAT,
unique(LAT)), match(LONG, unique(LONG)), drop=TRUE, lex.order = FALSE)))
使用.GRP
的data.table
選項
> setDT(d)[, Cluster_ID := .GRP, .(LAT, LONG)][]
LAT LONG Cluster_ID
1: 13.533 -15.418 1
2: 13.533 -15.418 1
3: 13.533 -15.418 1
4: 13.533 -15.418 1
5: 13.533 -15.417 2
6: 13.533 -15.417 2
7: 13.533 -15.417 2
8: 13.534 -14.935 3
9: 13.534 -14.935 3
10: 13.534 -15.917 4
11: 13.367 -14.619 5
或rleid
(感謝@akrun 的評論)
> setDT(d)[, Cluster_ID := rleid(LAT, LONG)][]
LAT LONG Cluster_ID
1: 13.533 -15.418 1
2: 13.533 -15.418 1
3: 13.533 -15.418 1
4: 13.533 -15.418 1
5: 13.533 -15.417 2
6: 13.533 -15.417 2
7: 13.533 -15.417 2
8: 13.534 -14.935 3
9: 13.534 -14.935 3
10: 13.534 -15.917 4
11: 13.367 -14.619 5
或使用ave
+ cumsum
的基本 R 選項
transform(
d,
Cluster_ID = cumsum(
ave(1:nrow(d),
LAT,
LONG,
FUN = seq_along
) == 1
)
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.