[英]R - How to change values in one Matrix based on elements in another Matrix
I have the following covariance matrix in R: 我在R中具有以下协方差矩阵:
AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200
AB-2000 6.5 NA -1.8 3.65 -17.96 -26.5
AB-2600 NA 7.18 NA NA NA NA
AB-3500 -1.79 NA 5.4 NA -4.63 NA
AC-0100 3.65 NA NA 4.22 9.8 NA
AD-0100 -17.96 NA -4.63 9.8 5.9 NA
AF-0200 -26.5 NA NA NA NA 4.28
Each column and row corresponds to a football player (ie, AB-2000). 每列和每一行对应一个足球运动员(即AB-2000)。 So the intersection of AB-2000, AB-2000 gives the variance for that players performance.
因此,AB-2000,AB-2000的交集为该球员的表现提供了差异。 A row like AB-2000, AF-0200 gives the covariance of two players performance.
像AB-2000,AF-0200这样的行给出了两个玩家表现的协方差。
Currently, the matrix shows all covariance values. 当前,矩阵显示所有协方差值。 However, not all covariance values matter.
但是,并非所有协方差值都重要。 In fact, the only ones that matter are when two players are playing the same game that week (in this case, have the same game ID (GID)).
实际上,唯一重要的是两个玩家在同一周玩同一游戏(在这种情况下,具有相同的游戏ID(GID))。
The following table shows the GID for a PLAYER on certain week: 下表显示了某周某位玩家的GID:
GID PLAYER
3467 AB-2000
3460 AB-2600
3463 AB-3500
3467 AC-0100
3458 AD-0100
3461 AF-0200
How do I go about keeping only the values in the covariance matrix when the two players have the same GID (for instance, players AB-2000 and AC-0100)? 当两个播放器具有相同的GID(例如,播放器AB-2000和AC-0100)时,如何只将值保留在协方差矩阵中?
Thanks for the help! 谢谢您的帮助!
I think this does what you're asking, if I'm interpreting the question correctly. 如果我正确地解释了问题,我认为这符合您的要求。 I've given you a couple solutions, pick your poison.
我给了你一些解决方案,选择你的毒药。 The first relies on a nested for loop which could be slow and further optimized if you knew for sure your matrix was symmetric.
第一个依赖于嵌套的for循环,如果您确定矩阵是对称的,则嵌套循环可能会很慢并且可以进一步优化。
m <- read.table(header=T, stringsAsFactors=F, text="
AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200
AB-2000 6.5 NA -1.8 3.65 -17.96 -26.5
AB-2600 NA 7.18 NA NA NA NA
AB-3500 -1.79 NA 5.4 NA -4.63 NA
AC-0100 3.65 NA NA 4.22 9.8 NA
AD-0100 -17.96 NA -4.63 9.8 5.9 NA
AF-0200 -26.5 NA NA NA NA 4.28
")
p <- read.table(header=T, stringsAsFactors=F, text="
GID PLAYER
3467 AB-2000
3460 AB-2600
3463 AB-3500
3467 AC-0100
3458 AD-0100
3461 AF-0200
")
m_t2 <- cm
names(m_t2) <- row.names(m_t2)
## Replace names with GID:
row_names <- p$GID[which(p$PLAYER == row.names(m_t2))]
col_names <- p$GID[which(p$PLAYER == names(m_t2))]
for (i in 1:nrow(m_t2)) {
m_t2[i, col_names != row_names[i]] <- NA
}
m_t2 <- as.matrix(m_t2)
Alternatively this solution does relies on the tidyr
and dplyr
packages but it should be quite efficient for very large datasets: 另外,此解决方案确实依赖于
tidyr
和dplyr
软件包,但是对于非常大的数据集,它应该非常有效:
m <- cm
names(m) <- row.names(m)
m$row_names <- row.names(m)
library(tidyr)
library(dplyr)
d <- m %>%
gather(col_names, "cv", -row_names, convert=T) %>%
left_join(p, by = c("row_names" = "PLAYER")) %>%
mutate(GID_row = GID) %>%
select(-GID) %>%
left_join(p, by=c("col_names" = "PLAYER")) %>%
mutate(GID_col = GID) %>%
mutate(new_cv = ifelse((GID_row == GID_col), cv, NA)) %>%
select(row_names, col_names, new_cv) %>%
spread(col_names, new_cv)
m_t <- as.matrix(d[,-1])
row.names(m_t) <- d[["row_names"]]
The solution in either case looks like this: 两种情况下的解决方案如下所示:
> m_t
AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200
AB-2000 6.50 NA NA 3.65 NA NA
AB-2600 NA 7.18 NA NA NA NA
AB-3500 NA NA 5.4 NA NA NA
AC-0100 3.65 NA NA 4.22 NA NA
AD-0100 NA NA NA NA 5.9 NA
AF-0200 NA NA NA NA NA 4.28
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.