[英]R - How to change elements in a matrix starting from a specific point
我有一個數據框 M,它看起來像:
[,1] [,2] [,3] [,4] [,5]
[1,] 0.4212778 0.6874073 0.1551896 Cluster_1
[2,] 0.6874073 0.5610995 0.1779030 Cluster_1
[3,] 0.1551896 0.1779030 0.9515304 Cluster_1
[4,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
[5,] 0.4675764 0.5407295 0.7942978 Cluster_1
[6,] 0.4675764 0.5407295 0.7942978 Cluster_1
[7,] 0.4675764 0.5407295 0.7942978 Cluster_2
[8,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
[9,] 0.4675764 0.5407295 0.7942978 Cluster_2
[10,] 0.4675764 0.5407295 0.7942978 Cluster_2
[11,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
[12,] 0.4675764 0.5407295 0.7942978 Cluster_2
[13,] 0.4675764 0.5407295 0.7942978 Cluster_2
[14,] 0.4675764 0.5407295 0.7942978 Cluster_3
[15,] 0.4675764 0.5407295 0.7942978 Cluster_3
[15,] 0.4675764 0.5407295 0.7942978 Cluster_3
[16,] 0.4675764 0.5407295 0.7942978 Cluster_4
[17,] 0.4675764 0.5407295 0.7942978 Cluster_4
我想在從帶有“_A”標志的元素開始直到下一個名稱更改的范圍內分配矩陣元素的相同名稱(第 5 列)。 在這種情況下:
我想要的結果如下:
[,1] [,2] [,3] [,4] [,5]
[1,] 0.4212778 0.6874073 0.1551896 Cluster_1
[2,] 0.6874073 0.5610995 0.1779030 Cluster_1
[3,] 0.1551896 0.1779030 0.9515304 Cluster_1
[4,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
[5,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
[6,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
[7,] 0.4675764 0.5407295 0.7942978 Cluster_2
[8,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
[9,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
[10,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
[11,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
[12,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
[13,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
[14,] 0.4675764 0.5407295 0.7942978 Cluster_3
[15,] 0.4675764 0.5407295 0.7942978 Cluster_3
[15,] 0.4675764 0.5407295 0.7942978 Cluster_3
[16,] 0.4675764 0.5407295 0.7942978 Cluster_4
[17,] 0.4675764 0.5407295 0.7942978 Cluster_4
我怎樣才能快速做到這一點,同時避免 for 循環(我可以編碼)? 我看過很多類似的帖子,但沒有人給我想要的。 謝謝!
您可以嘗試下面的基本 R 解決方案,它應用了ave
+ gsub
M <- within(M,V5 <- ave(V5,
gsub("(Cluster_\\d+).*","\\1",V5),
FUN = function(x) ave(x,
cumsum(grepl("_A",x)),
FUN = function(q) head(q,1))))
以至於
> M
V1 V2 V3 V4 V5
1 [1,] 0.4212778 0.6874073 0.1551896 Cluster_1
2 [2,] 0.6874073 0.5610995 0.1779030 Cluster_1
3 [3,] 0.1551896 0.1779030 0.9515304 Cluster_1
4 [4,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
5 [5,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
6 [6,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
7 [7,] 0.4675764 0.5407295 0.7942978 Cluster_2
8 [8,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
9 [9,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
10 [10,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
11 [11,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
12 [12,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
13 [13,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
14 [14,] 0.4675764 0.5407295 0.7942978 Cluster_3
15 [15,] 0.4675764 0.5407295 0.7942978 Cluster_3
16 [15,] 0.4675764 0.5407295 0.7942978 Cluster_3
17 [16,] 0.4675764 0.5407295 0.7942978 Cluster_4
18 [17,] 0.4675764 0.5407295 0.7942978 Cluster_4
數據
M <- structure(list(V1 = c("[1,]", "[2,]", "[3,]", "[4,]", "[5,]",
"[6,]", "[7,]", "[8,]", "[9,]", "[10,]", "[11,]", "[12,]", "[13,]",
"[14,]", "[15,]", "[15,]", "[16,]", "[17,]"), V2 = c(0.4212778,
0.6874073, 0.1551896, 0.4675764, 0.4675764, 0.4675764, 0.4675764,
0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764,
0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764), V3 = c(0.6874073,
0.5610995, 0.177903, 0.5407295, 0.5407295, 0.5407295, 0.5407295,
0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295,
0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295), V4 = c(0.1551896,
0.177903, 0.9515304, 0.7942978, 0.7942978, 0.7942978, 0.7942978,
0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978,
0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978), V5 = c("Cluster_1",
"Cluster_1", "Cluster_1", "Cluster_1_A", "Cluster_1", "Cluster_1",
"Cluster_2", "Cluster_2_A", "Cluster_2", "Cluster_2", "Cluster_2_1_A",
"Cluster_2", "Cluster_2", "Cluster_3", "Cluster_3", "Cluster_3",
"Cluster_4", "Cluster_4")), class = "data.frame", row.names = c(NA,
-18L))
您可以嘗試這樣的操作,將您的(看起來像矩陣)轉換為 data.frame:
df = structure(list(value1 = c(0.4212778, 0.6874073, 0.1551896, 0.4675764,
0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764,
0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764,
0.4675764, 0.4675764), value2 = c(0.6874073, 0.5610995, 0.177903,
0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295,
0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295,
0.5407295, 0.5407295, 0.5407295), value3 = c(0.1551896, 0.177903,
0.9515304, 0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978,
0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978,
0.7942978, 0.7942978, 0.7942978, 0.7942978), cluster = structure(c(1L,
1L, 1L, 2L, 1L, 1L, 3L, 5L, 3L, 3L, 4L, 3L, 3L, 6L, 6L, 6L, 7L,
7L), .Label = c("Cluster_1", "Cluster_1_A", "Cluster_2", "Cluster_2_1_A",
"Cluster_2_A", "Cluster_3", "Cluster_4"), class = "factor")), class = "data.frame", row.names = c(NA,
-18L))
head(df)
value1 value2 value3 cluster
1 0.4212778 0.6874073 0.1551896 Cluster_1
2 0.6874073 0.5610995 0.1779030 Cluster_1
3 0.1551896 0.1779030 0.9515304 Cluster_1
4 0.4675764 0.5407295 0.7942978 Cluster_1_A
5 0.4675764 0.5407295 0.7942978 Cluster_1
6 0.4675764 0.5407295 0.7942978 Cluster_1
如果我們執行以下操作:
df$id = as.numeric(factor(df$cluster))
head(df,10)
value1 value2 value3 cluster id
1 0.4212778 0.6874073 0.1551896 Cluster_1 1
2 0.6874073 0.5610995 0.1779030 Cluster_1 1
3 0.1551896 0.1779030 0.9515304 Cluster_1 1
4 0.4675764 0.5407295 0.7942978 Cluster_1_A 2
5 0.4675764 0.5407295 0.7942978 Cluster_1 1
6 0.4675764 0.5407295 0.7942978 Cluster_1 1
7 0.4675764 0.5407295 0.7942978 Cluster_2 3
8 0.4675764 0.5407295 0.7942978 Cluster_2_A 5
9 0.4675764 0.5407295 0.7942978 Cluster_2 3
10 0.4675764 0.5407295 0.7942978 Cluster_2 3
您可以看到每次增加時,即在第 3:4 行從 1 到 2 或在第 7:8 行從 3 到 5,這是我們想要將它們分開的地方。 所以我們這樣做:
df$grp = cumsum(c(0,diff(df$id)>0))
value1 value2 value3 cluster id grp
1 0.4212778 0.6874073 0.1551896 Cluster_1 1 0
2 0.6874073 0.5610995 0.1779030 Cluster_1 1 0
3 0.1551896 0.1779030 0.9515304 Cluster_1 1 0
4 0.4675764 0.5407295 0.7942978 Cluster_1_A 2 1
5 0.4675764 0.5407295 0.7942978 Cluster_1 1 1
6 0.4675764 0.5407295 0.7942978 Cluster_1 1 1
7 0.4675764 0.5407295 0.7942978 Cluster_2 3 2
8 0.4675764 0.5407295 0.7942978 Cluster_2_A 5 3
9 0.4675764 0.5407295 0.7942978 Cluster_2 3 3
您的新 ID 很簡單:
df$new = unlist(
tapply(as.character(df$cluster),
df$grp,
function(i)rep(i[1],length(i))))
value1 value2 value3 cluster id grp new
1 0.4212778 0.6874073 0.1551896 Cluster_1 1 0 Cluster_1
2 0.6874073 0.5610995 0.1779030 Cluster_1 1 0 Cluster_1
3 0.1551896 0.1779030 0.9515304 Cluster_1 1 0 Cluster_1
4 0.4675764 0.5407295 0.7942978 Cluster_1_A 2 1 Cluster_1_A
5 0.4675764 0.5407295 0.7942978 Cluster_1 1 1 Cluster_1_A
6 0.4675764 0.5407295 0.7942978 Cluster_1 1 1 Cluster_1_A
7 0.4675764 0.5407295 0.7942978 Cluster_2 3 2 Cluster_2
8 0.4675764 0.5407295 0.7942978 Cluster_2_A 5 3 Cluster_2_A
9 0.4675764 0.5407295 0.7942978 Cluster_2 3 3 Cluster_2_A
10 0.4675764 0.5407295 0.7942978 Cluster_2 3 3 Cluster_2_A
11 0.4675764 0.5407295 0.7942978 Cluster_2_1_A 4 4 Cluster_2_1_A
12 0.4675764 0.5407295 0.7942978 Cluster_2 3 4 Cluster_2_1_A
13 0.4675764 0.5407295 0.7942978 Cluster_2 3 4 Cluster_2_1_A
14 0.4675764 0.5407295 0.7942978 Cluster_3 6 5 Cluster_3
15 0.4675764 0.5407295 0.7942978 Cluster_3 6 5 Cluster_3
16 0.4675764 0.5407295 0.7942978 Cluster_3 6 5 Cluster_3
17 0.4675764 0.5407295 0.7942978 Cluster_4 7 6 Cluster_4
18 0.4675764 0.5407295 0.7942978 Cluster_4 7 6 Cluster_4
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.