簡體   English   中英

R - 如何從特定點開始更改矩陣中的元素

[英]R - How to change elements in a matrix starting from a specific point

我有一個數據框 M,它看起來像:

 [,1]      [,2]      [,3]       [,4]     [,5]
 [1,] 0.4212778 0.6874073 0.1551896 Cluster_1
 [2,] 0.6874073 0.5610995 0.1779030 Cluster_1
 [3,] 0.1551896 0.1779030 0.9515304 Cluster_1
 [4,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
 [5,] 0.4675764 0.5407295 0.7942978 Cluster_1
 [6,] 0.4675764 0.5407295 0.7942978 Cluster_1
 [7,] 0.4675764 0.5407295 0.7942978 Cluster_2
 [8,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
 [9,] 0.4675764 0.5407295 0.7942978 Cluster_2
[10,] 0.4675764 0.5407295 0.7942978 Cluster_2
[11,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
[12,] 0.4675764 0.5407295 0.7942978 Cluster_2
[13,] 0.4675764 0.5407295 0.7942978 Cluster_2
[14,] 0.4675764 0.5407295 0.7942978 Cluster_3
[15,] 0.4675764 0.5407295 0.7942978 Cluster_3
[15,] 0.4675764 0.5407295 0.7942978 Cluster_3
[16,] 0.4675764 0.5407295 0.7942978 Cluster_4
[17,] 0.4675764 0.5407295 0.7942978 Cluster_4

我想在從帶有“_A”標志的元素開始直到下一個名稱更改的范圍內分配矩陣元素的相同名稱(第 5 列)。 在這種情況下:

  • 從 M[4,5] 到元素 M[6,5]
  • 從 M[8,5] 到元素 M[10,5]
  • 從 M[11,5] 到元素 M[13,5]

我想要的結果如下:

 [,1]      [,2]      [,3]       [,4]     [,5]
 [1,] 0.4212778 0.6874073 0.1551896 Cluster_1
 [2,] 0.6874073 0.5610995 0.1779030 Cluster_1
 [3,] 0.1551896 0.1779030 0.9515304 Cluster_1
 [4,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
 [5,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
 [6,] 0.4675764 0.5407295 0.7942978 Cluster_1_A
 [7,] 0.4675764 0.5407295 0.7942978 Cluster_2
 [8,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
 [9,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
[10,] 0.4675764 0.5407295 0.7942978 Cluster_2_A
[11,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
[12,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
[13,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
[14,] 0.4675764 0.5407295 0.7942978 Cluster_3
[15,] 0.4675764 0.5407295 0.7942978 Cluster_3
[15,] 0.4675764 0.5407295 0.7942978 Cluster_3
[16,] 0.4675764 0.5407295 0.7942978 Cluster_4
[17,] 0.4675764 0.5407295 0.7942978 Cluster_4

我怎樣才能快速做到這一點,同時避免 for 循環(我可以編碼)? 我看過很多類似的帖子,但沒有人給我想要的。 謝謝!

您可以嘗試下面的基本 R 解決方案,它應用了ave + gsub

M <- within(M,V5 <- ave(V5,
                          gsub("(Cluster_\\d+).*","\\1",V5), 
                          FUN = function(x) ave(x,
                                                cumsum(grepl("_A",x)),
                                                FUN = function(q) head(q,1))))

以至於

> M
      V1        V2        V3        V4            V5
1   [1,] 0.4212778 0.6874073 0.1551896     Cluster_1
2   [2,] 0.6874073 0.5610995 0.1779030     Cluster_1
3   [3,] 0.1551896 0.1779030 0.9515304     Cluster_1
4   [4,] 0.4675764 0.5407295 0.7942978   Cluster_1_A
5   [5,] 0.4675764 0.5407295 0.7942978   Cluster_1_A
6   [6,] 0.4675764 0.5407295 0.7942978   Cluster_1_A
7   [7,] 0.4675764 0.5407295 0.7942978     Cluster_2
8   [8,] 0.4675764 0.5407295 0.7942978   Cluster_2_A
9   [9,] 0.4675764 0.5407295 0.7942978   Cluster_2_A
10 [10,] 0.4675764 0.5407295 0.7942978   Cluster_2_A
11 [11,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
12 [12,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
13 [13,] 0.4675764 0.5407295 0.7942978 Cluster_2_1_A
14 [14,] 0.4675764 0.5407295 0.7942978     Cluster_3
15 [15,] 0.4675764 0.5407295 0.7942978     Cluster_3
16 [15,] 0.4675764 0.5407295 0.7942978     Cluster_3
17 [16,] 0.4675764 0.5407295 0.7942978     Cluster_4
18 [17,] 0.4675764 0.5407295 0.7942978     Cluster_4

數據

M <- structure(list(V1 = c("[1,]", "[2,]", "[3,]", "[4,]", "[5,]", 
"[6,]", "[7,]", "[8,]", "[9,]", "[10,]", "[11,]", "[12,]", "[13,]", 
"[14,]", "[15,]", "[15,]", "[16,]", "[17,]"), V2 = c(0.4212778, 
0.6874073, 0.1551896, 0.4675764, 0.4675764, 0.4675764, 0.4675764, 
0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764, 
0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764), V3 = c(0.6874073, 
0.5610995, 0.177903, 0.5407295, 0.5407295, 0.5407295, 0.5407295, 
0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295, 
0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295), V4 = c(0.1551896, 
0.177903, 0.9515304, 0.7942978, 0.7942978, 0.7942978, 0.7942978, 
0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978, 
0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978), V5 = c("Cluster_1", 
"Cluster_1", "Cluster_1", "Cluster_1_A", "Cluster_1", "Cluster_1", 
"Cluster_2", "Cluster_2_A", "Cluster_2", "Cluster_2", "Cluster_2_1_A", 
"Cluster_2", "Cluster_2", "Cluster_3", "Cluster_3", "Cluster_3", 
"Cluster_4", "Cluster_4")), class = "data.frame", row.names = c(NA, 
-18L))

您可以嘗試這樣的操作,將您的(看起來像矩陣)轉換為 data.frame:

df = structure(list(value1 = c(0.4212778, 0.6874073, 0.1551896, 0.4675764, 
0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764, 
0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764, 0.4675764, 
0.4675764, 0.4675764), value2 = c(0.6874073, 0.5610995, 0.177903, 
0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295, 
0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295, 0.5407295, 
0.5407295, 0.5407295, 0.5407295), value3 = c(0.1551896, 0.177903, 
0.9515304, 0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978, 
0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978, 0.7942978, 
0.7942978, 0.7942978, 0.7942978, 0.7942978), cluster = structure(c(1L, 
1L, 1L, 2L, 1L, 1L, 3L, 5L, 3L, 3L, 4L, 3L, 3L, 6L, 6L, 6L, 7L, 
7L), .Label = c("Cluster_1", "Cluster_1_A", "Cluster_2", "Cluster_2_1_A", 
"Cluster_2_A", "Cluster_3", "Cluster_4"), class = "factor")), class = "data.frame", row.names = c(NA, 
-18L))

head(df)

     value1    value2    value3     cluster
1 0.4212778 0.6874073 0.1551896   Cluster_1
2 0.6874073 0.5610995 0.1779030   Cluster_1
3 0.1551896 0.1779030 0.9515304   Cluster_1
4 0.4675764 0.5407295 0.7942978 Cluster_1_A
5 0.4675764 0.5407295 0.7942978   Cluster_1
6 0.4675764 0.5407295 0.7942978   Cluster_1

如果我們執行以下操作:

df$id = as.numeric(factor(df$cluster))

head(df,10)
      value1    value2    value3     cluster id
1  0.4212778 0.6874073 0.1551896   Cluster_1  1
2  0.6874073 0.5610995 0.1779030   Cluster_1  1
3  0.1551896 0.1779030 0.9515304   Cluster_1  1
4  0.4675764 0.5407295 0.7942978 Cluster_1_A  2
5  0.4675764 0.5407295 0.7942978   Cluster_1  1
6  0.4675764 0.5407295 0.7942978   Cluster_1  1
7  0.4675764 0.5407295 0.7942978   Cluster_2  3
8  0.4675764 0.5407295 0.7942978 Cluster_2_A  5
9  0.4675764 0.5407295 0.7942978   Cluster_2  3
10 0.4675764 0.5407295 0.7942978   Cluster_2  3

您可以看到每次增加時,即在第 3:4 行從 1 到 2 或在第 7:8 行從 3 到 5,這是我們想要將它們分開的地方。 所以我們這樣做:

df$grp = cumsum(c(0,diff(df$id)>0))

      value1    value2    value3     cluster id grp
1  0.4212778 0.6874073 0.1551896   Cluster_1  1   0
2  0.6874073 0.5610995 0.1779030   Cluster_1  1   0
3  0.1551896 0.1779030 0.9515304   Cluster_1  1   0
4  0.4675764 0.5407295 0.7942978 Cluster_1_A  2   1
5  0.4675764 0.5407295 0.7942978   Cluster_1  1   1
6  0.4675764 0.5407295 0.7942978   Cluster_1  1   1
7  0.4675764 0.5407295 0.7942978   Cluster_2  3   2
8  0.4675764 0.5407295 0.7942978 Cluster_2_A  5   3
9  0.4675764 0.5407295 0.7942978   Cluster_2  3   3

您的新 ID 很簡單:

df$new = unlist(
tapply(as.character(df$cluster),
df$grp,
function(i)rep(i[1],length(i))))

      value1    value2    value3       cluster id grp           new
1  0.4212778 0.6874073 0.1551896     Cluster_1  1   0     Cluster_1
2  0.6874073 0.5610995 0.1779030     Cluster_1  1   0     Cluster_1
3  0.1551896 0.1779030 0.9515304     Cluster_1  1   0     Cluster_1
4  0.4675764 0.5407295 0.7942978   Cluster_1_A  2   1   Cluster_1_A
5  0.4675764 0.5407295 0.7942978     Cluster_1  1   1   Cluster_1_A
6  0.4675764 0.5407295 0.7942978     Cluster_1  1   1   Cluster_1_A
7  0.4675764 0.5407295 0.7942978     Cluster_2  3   2     Cluster_2
8  0.4675764 0.5407295 0.7942978   Cluster_2_A  5   3   Cluster_2_A
9  0.4675764 0.5407295 0.7942978     Cluster_2  3   3   Cluster_2_A
10 0.4675764 0.5407295 0.7942978     Cluster_2  3   3   Cluster_2_A
11 0.4675764 0.5407295 0.7942978 Cluster_2_1_A  4   4 Cluster_2_1_A
12 0.4675764 0.5407295 0.7942978     Cluster_2  3   4 Cluster_2_1_A
13 0.4675764 0.5407295 0.7942978     Cluster_2  3   4 Cluster_2_1_A
14 0.4675764 0.5407295 0.7942978     Cluster_3  6   5     Cluster_3
15 0.4675764 0.5407295 0.7942978     Cluster_3  6   5     Cluster_3
16 0.4675764 0.5407295 0.7942978     Cluster_3  6   5     Cluster_3
17 0.4675764 0.5407295 0.7942978     Cluster_4  7   6     Cluster_4
18 0.4675764 0.5407295 0.7942978     Cluster_4  7   6     Cluster_4

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM