![](/img/trans.png)
[英]Trying to keep values of a column based on the unique values of two other columns
[英]Add unique ID column based on values in two other columns (lat, long)
有人问过这个问题,但我正在寻找更完整的答案/稍作修改的 output。
我在单独的列中有一个包含 Lat 和 Long 值的数据集,并希望为 Lat 和 Long 的每个唯一组合创建一个唯一 ID。
我将从一篇较早的帖子中借用一个示例数据集,提出相同的问题,但我需要一个稍微不同的解决方案( 按组添加 ID 列)。
d <- read.table(text='LAT LONG
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4180
13.5330 -15.4170
13.5330 -15.4170
13.5330 -15.4170
13.5340 -14.9350
13.5340 -14.9350
13.5340 -15.9170
13.3670 -14.6190', header=TRUE)
给出的解决方案是:
d <- transform(d, Cluster_ID = as.numeric(interaction(LAT, LONG, drop=TRUE)))
# LAT LONG Cluster_ID
# 1 13.533 -15.418 2
# 2 13.533 -15.418 2
# 3 13.533 -15.418 2
# 4 13.533 -15.418 2
# 5 13.533 -15.417 3
# 6 13.533 -15.417 3
# 7 13.533 -15.417 3
# 8 13.534 -14.935 4
# 9 13.534 -14.935 4
# 10 13.534 -15.917 1
# 11 13.367 -14.619 5
但是如何让interaction
命令保持顺序,以便上面的第一个 Cluster_ID 为 1(最后一列的完整向量将是 1,1,1,1,2,2,2,3,3,4,5 2,2,2,2,3,3,4,4,1,5)? 目前尚不清楚如何确定新的因子顺序(转换为数字)。
I have also been trying to find equivalent way of doing this using group_by
in dplyr
but can't figure out how to output the tibble table as a dataframe (older solutions on SO seem to use depreciated dplyr commands).
谢谢!
我们可以使用match
transform(d, Cluster_ID = match(paste0(LAT, LONG), unique(paste0(LAT, LONG))))
或将“LAT”、“LONG”转换为序列,然后进行interaction
transform(d, Cluster_ID = as.integer(interaction(match(LAT,
unique(LAT)), match(LONG, unique(LONG)), drop=TRUE, lex.order = FALSE)))
使用.GRP
的data.table
选项
> setDT(d)[, Cluster_ID := .GRP, .(LAT, LONG)][]
LAT LONG Cluster_ID
1: 13.533 -15.418 1
2: 13.533 -15.418 1
3: 13.533 -15.418 1
4: 13.533 -15.418 1
5: 13.533 -15.417 2
6: 13.533 -15.417 2
7: 13.533 -15.417 2
8: 13.534 -14.935 3
9: 13.534 -14.935 3
10: 13.534 -15.917 4
11: 13.367 -14.619 5
或rleid
(感谢@akrun 的评论)
> setDT(d)[, Cluster_ID := rleid(LAT, LONG)][]
LAT LONG Cluster_ID
1: 13.533 -15.418 1
2: 13.533 -15.418 1
3: 13.533 -15.418 1
4: 13.533 -15.418 1
5: 13.533 -15.417 2
6: 13.533 -15.417 2
7: 13.533 -15.417 2
8: 13.534 -14.935 3
9: 13.534 -14.935 3
10: 13.534 -15.917 4
11: 13.367 -14.619 5
或使用ave
+ cumsum
的基本 R 选项
transform(
d,
Cluster_ID = cumsum(
ave(1:nrow(d),
LAT,
LONG,
FUN = seq_along
) == 1
)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.