简体   繁体   English

具有成对距离矩阵输出的列名和行名

[英]colnames and rownames with pairwise distance matrix outputs

EDIT:编辑:

I am trying to collect these values/Cols/Rows我正在尝试收集这些值/列/行

在此处输入图片说明

** The numbers have changed slightly. ** 数字略有变化。

I am trying to extract the pairwise result of the following matrix.我正在尝试提取以下矩阵的成对结果。

           ID1_2001   ID2_2001   ID3_2001   ID1_2000   ID2_2000
ID2_2001 0.96747537                                            
ID3_2001 0.96850817 0.67983338                                 
ID1_2000 0.11324889 0.97507292 0.97586446                      
ID2_2000 1.00000000 0.75336751 0.83321843 1.00000000           
ID3_2000 1.00000000 0.76556229 0.81577353 1.00000000 0.05728332

That is the values of 0.1132489 , 0.7533675 , 0.8157735 .0.11324890.75336750.8157735的值。

Thanks to another user on this site I know of the following function proxy::dist(m[1:3,], m[4:6,], pairwise=TRUE, method="cosine") which gives me just the following results 0.1132489 0.7533675 0.8157735 .感谢本网站上的另一位用户,我知道以下函数proxy::dist(m[1:3,], m[4:6,], pairwise=TRUE, method="cosine")它给了我以下内容结果0.1132489 0.7533675 0.8157735

However I would also like the column and row names from where the result comes from.但是,我也想要结果来自的列名和行名。 So 0.1132489 would be assigned to ID1_2000_ID1_2001 , and 0.7533675 assigned to ID2_2000_ID2_2001 , and finally 0.81577353 assigned to ID3_2000_ID3_2001 .所以0.1132489将分配给ID1_2000_ID1_20010.7533675分配给ID2_2000_ID2_2001 ,最后0.81577353分配给ID3_2000_ID3_2001 However I cannot put this distance matrix into a data frame to Access/extract row_names and colnames.但是,我无法将此距离矩阵放入数据框中以访问/提取 row_names 和 colnames。

It would be most optimal to run just the following proxy::dist(m[1:3,], m[4:6,], pairwise=TRUE, method="cosine") and obtain the pairwise results along with their colnames and rownames (saving on computational time).最好只运行以下proxy::dist(m[1:3,], m[4:6,], pairwise=TRUE, method="cosine")并获得成对结果及其列名和行名(节省计算时间)。

How can I replace the m[1:3] with "groups", ie take 2001 group and then take 2000 group.如何用“组”替换m[1:3] ,即取2001组,然后取2000组。 Since I hope to scale this up to more years/IDs and I cannot count the rows 1:3 and 4:6 for all years/IDs.由于我希望将其扩展到更多年份/ID,并且我无法计算所有年份/ID 的1:34:6行。

library(tidyr)
x <- m %>%
  data.frame() %>%
  tibble::rownames_to_column("rownames") %>%
  separate(rownames, c("id", "year"), "_")

Other:其他:

dist.matrix = proxy::dist(m, pairwise = TRUE, method = "cosine") 
proxy::dist(m[1:3,], m[4:6,], pairwise=TRUE, method="cosine")

Data:数据:

data <- structure(c(0.96747537487273, 0.968508167135111, 0.113248890901578, 
1, 1, 0.67983337671352, 0.97507292188601, 0.753367507803825, 
0.765562291938692, 0.975864460398726, 0.833218430412641, 0.815773525411265, 
1, 1, 0.0572833227621783), Size = 6L, Labels = c("ID1_2001", 
"ID2_2001", "ID3_2001", "ID1_2000", "ID2_2000", "ID3_2000"), class = "dist", Diag = FALSE, Upper = FALSE, method = "cosine", call = proxy::dist(x = m, 
    method = "cosine", pairwise = TRUE))

Data 2 ( m )数据 2 ( m )

m <- structure(c(0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 2, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 1, 3, 3, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 
0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 
0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 2, 2, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 8, 0, 
0, 12, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 
0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 1, 0, 
1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 
0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 
1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 
0, 0, 0, 1, 1, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 2, 2, 0, 
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 
3, 4, 0, 1, 3, 0, 1, 1, 0, 2, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 
0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 0, 3, 0, 0, 3, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 
0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 3, 0, 0, 2, 2, 0, 0, 0, 0, 
1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 2, 0, 0, 2, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 
1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 
0, 0, 0, 2, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 
2, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 4, 2, 0, 1, 1, 0, 
1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 0, 0, 2, 0, 0, 0, 
0, 0, 0, 1, 1), .Dim = c(6L, 196L), .Dimnames = list(Docs = c("ID1_2001", 
"ID2_2001", "ID3_2001", "ID1_2000", "ID2_2000", "ID3_2000"), 
    Terms = c("-field", "(22-yard)", "(doubles).", "(either", 
    "(known", "(singles)", "(specifically", "20-metre", "able", 
    "across", "activity", "adjudicated", "aided", "although", 
    "american", "appears", "appears.", "around", "association", 
    "australian", "badminton", "bails", "bails,", "balanced", 
    "ball", "bat--ball", "bat,", "batting", "beach", "bowled", 
    "bowled,", "bowling", "called", "can", "canadian", "casual", 
    "catching", "centre", "certain", "codes", "common", "commonly", 
    "communicate", "comprising", "context", "cord", "countries", 
    "countries);", "court", "court.", "covered", "cricket", "degrees", 
    "degrees,", "different", "dislodges", "dismiss", "dismissal", 
    "dismissed,", "doubles", "each", "either", "eleven", "end,", 
    "ends", "family", "felt", "field", "fielding", "football", 
    "football);", "football.[1][2]", "football;", "football12", 
    "form", "formal", "forms", "gaelic", "gain", "game", "games", 
    "goal", "goal.", "gridiron", "ground.", "half", "hit", "hits", 
    "hollow", "include", "individually", "indoor", "information.", 
    "innings", "international", "involve", "involve,", "kicking", 
    "known", "landing", "larger", "league", "maneuver", "match", 
    "match's", "matches.", "may", "means", "net", "object", "often", 
    "one", "opponent", "opponent's", "opposing", "opposite", 
    "outdoor", "per", "pitch", "places", "play", "played", "player", 
    "players", "point,", "points", "popular", "prevent", "racket", 
    "racquet", "racquets", "record", "rectangular", "refer", 
    "referee", "regional", "return", "return.", "roles.", "rubber", 
    "rugby", "rules", "runs", "score", "scored", "scorers", "scores", 
    "shuttlecock", "side", "sides", "single", "singles", "soccer", 
    "specifically", "sport", "sports", "statistical", "strike", 
    "striking", "strung", "stumps", "stumps.", "swap", "team", 
    "teams", "ten", "tennis", "the", "these", "they", "third", 
    "three", "tries", "two", "umpire", "umpires,", "unable", 
    "understood", "union", "union);", "unqualified", "unqualified,", 
    "uses", "using", "valid", "variations", "varying", "way", 
    "when", "whichever", "wicket", "will", "will.", "within", 
    "word", "yard")))

EDIT:编辑:

I found this workaround to put into a data frame.我发现这个解决方法可以放入数据框中。 Not sure how efficient it will be on the large matrix不确定它在大型矩阵上的效率如何

x <- data.matrix(dist.matrix)
x <- as.data.frame(x)

EDIT2:编辑2:

> data.frame(rownames(dist.matrix), colnames(dist.matrix), as.vector(dist.matrix))
   rownames.dist.matrix. colnames.dist.matrix. as.vector.dist.matrix.
1               ID1_2001              ID2_2001             0.97192896
2               ID1_2001              ID2_2001             0.97288923
3               ID1_2001              ID2_2001             0.01505221
4               ID1_2001              ID2_2001             1.00000000
5               ID1_2001              ID2_2001             1.00000000
6               ID1_2001              ID2_2001             0.69527190
7               ID1_2001              ID2_2001             0.97565046
8               ID1_2001              ID2_2001             0.75908178
9               ID1_2001              ID2_2001             0.77099402
10              ID1_2001              ID2_2001             0.97648342
11              ID1_2001              ID2_2001             0.77840308
12              ID1_2001              ID2_2001             0.76921180
13              ID1_2001              ID2_2001             1.00000000
14              ID1_2001              ID2_2001             1.00000000
15              ID1_2001              ID2_2001             0.05728332

EDIT 3:编辑 3:

I run the following;我运行以下;

dist.matrix = as.matrix(dist.matrix)


df <- data.frame(row   = rownames(dist.matrix), 
                 col   = colnames(dist.matrix), 
                 value = as.vector(dist.matrix))

Which gives me the following output:这给了我以下输出:

     row      col      value
1  ID1_2001 ID1_2001 0.00000000
2  ID2_2001 ID2_2001 0.97192896
3  ID3_2001 ID3_2001 0.97288923
4  ID1_2000 ID1_2000 0.01505221
5  ID2_2000 ID2_2000 1.00000000
6  ID3_2000 ID3_2000 1.00000000
7  ID1_2001 ID1_2001 0.97192896
8  ID2_2001 ID2_2001 0.00000000
9  ID3_2001 ID3_2001 0.69527190
10 ID1_2000 ID1_2000 0.97565046
11 ID2_2000 ID2_2000 0.75908178
12 ID3_2000 ID3_2000 0.77099402
13 ID1_2001 ID1_2001 0.97288923
14 ID2_2001 ID2_2001 0.69527190
15 ID3_2001 ID3_2001 0.00000000
16 ID1_2000 ID1_2000 0.97648342
17 ID2_2000 ID2_2000 0.77840308
18 ID3_2000 ID3_2000 0.76921180
19 ID1_2001 ID1_2001 0.01505221
20 ID2_2001 ID2_2001 0.97565046
21 ID3_2001 ID3_2001 0.97648342
22 ID1_2000 ID1_2000 0.00000000
23 ID2_2000 ID2_2000 1.00000000
24 ID3_2000 ID3_2000 1.00000000
25 ID1_2001 ID1_2001 1.00000000
26 ID2_2001 ID2_2001 0.75908178
27 ID3_2001 ID3_2001 0.77840308
28 ID1_2000 ID1_2000 1.00000000
29 ID2_2000 ID2_2000 0.00000000
30 ID3_2000 ID3_2000 0.05728332
31 ID1_2001 ID1_2001 1.00000000
32 ID2_2001 ID2_2001 0.77099402
33 ID3_2001 ID3_2001 0.76921180
34 ID1_2000 ID1_2000 1.00000000
35 ID2_2000 ID2_2000 0.05728332
36 ID3_2000 ID3_2000 0.00000000

EDIT 4:编辑 4:

x <- data.matrix(dist.matrix)
x <- as.data.frame(x)

library(tibble)
library(tidyr)
y <- x %>%
  rownames_to_column("row") %>%
  separate(row, c("id_row", "year_row"), "_")


z <- melt(y)
z

w <- z %>%
  separate(variable, c("id_col", "year_col"), "_")

w

Which seems to give这似乎给

> head(w)
  id_row year_row id_col year_col      value
1    ID1     2001    ID1     2001 0.00000000
2    ID2     2001    ID1     2001 0.97192896
3    ID3     2001    ID1     2001 0.97288923
4    ID1     2000    ID1     2001 0.01505221
5    ID2     2000    ID1     2001 1.00000000
6    ID3     2000    ID1     2001 1.00000000

Just stick the rownames and colnames in a dataframe alongside the data itself.只需将行名和列名与数据本身一起放在数据框中。 "Unraveling" the matrix as a vector (and vector recycling for the names) will take care of the rest:将矩阵“解开”为向量(以及名称的向量回收)将处理其余部分:

# example data
mat <- matrix(1:100, 10, 10)
rownames(mat) <- paste0("row",1:10)
colnames(mat) <- paste0("col",1:10)

# what you want
df <- data.frame(row   = rownames(mat), 
                 col   = colnames(mat), 
                 value = as.vector(mat) )

# take a look at the result
head(df)
#   row  col value
# row1 col1     1
# row2 col2     2
# row3 col3     3
# row4 col4     4
# row5 col5     5
# row6 col6     6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM