簡體   English   中英

在 R 中按集群分配 id

[英]Assign id by cluster in R

我有一個這樣的向量

var1=c("A","A","B"," "," ","C","A","","A")

如何創建指示它們是否相鄰的 id 向量。 喜歡

id1=c(1,1,1,0,0,2,2,0,3)

所以我想為每個集群分配 id。 在 R 中有什么方法可以做到這一點?

這是rle一種選擇。 我們使用trimws刪除前導/滯后空間,根據它是否為非空字符串轉換為邏輯向量( nzchar )並獲得運行長度編碼( rle )。 將 'rl' list中為 TRUE 的 'values' 向量更改為序列並復制具有lengthsvalues

rl <- rle(nzchar(trimws(var1)))
rl$values[rl$values] <- seq_along(rl$values[rl$values])
rep(rl$values, rl$lengths)
#[1] 1 1 1 0 0 2 2 0 3

數據

var1=c("A","A","B"," "," ","C","A","","A")

我們可以對var1diff進行cumsum以生成一個表示包含空字符串的集群的序列,然后用0替換空字符串位置:

replace(cumsum(c(T, diff(var1 != "") == 1)), var1 == "", 0) 

給出:

# [1] 1 1 1 0 0 2 2 0 3

為了:

var1=c("A","A","B","","","C","A","","A")

這假設var1不以空字符串開頭,為了將其概括為這種情況,我們可以檢查var1的第一個元素並將條件用作初始值:

replace(cumsum(c(var1[1] != "", diff(var1 != "") == 1)), var1 == "", 0)

給出:

# [1] 0 1 1 1 0 0 2 2 0 3

為了:

var1=c("", "A","A","B","","","C","A","","A")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM