简体   繁体   English

R-如何基于另一个矩阵中的元素更改一个矩阵中的值

[英]R - How to change values in one Matrix based on elements in another Matrix

I have the following covariance matrix in R: 我在R中具有以下协方差矩阵:

AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200
AB-2000 6.5 NA  -1.8    3.65    -17.96  -26.5
AB-2600 NA  7.18    NA  NA  NA  NA
AB-3500 -1.79   NA  5.4 NA  -4.63   NA
AC-0100 3.65    NA  NA  4.22    9.8 NA
AD-0100 -17.96  NA  -4.63   9.8 5.9 NA
AF-0200 -26.5   NA  NA  NA  NA  4.28

Each column and row corresponds to a football player (ie, AB-2000). 每列和每一行对应一个足球运动员(即AB-2000)。 So the intersection of AB-2000, AB-2000 gives the variance for that players performance. 因此,AB-2000,AB-2000的交集为该球员的表现提供了差异。 A row like AB-2000, AF-0200 gives the covariance of two players performance. 像AB-2000,AF-0200这样的行给出了两个玩家表现的协方差。

Currently, the matrix shows all covariance values. 当前,矩阵显示所有协方差值。 However, not all covariance values matter. 但是,并非所有协方差值都重要。 In fact, the only ones that matter are when two players are playing the same game that week (in this case, have the same game ID (GID)). 实际上,唯一重要的是两个玩家在同一周玩同一游戏(在这种情况下,具有相同的游戏ID(GID))。

The following table shows the GID for a PLAYER on certain week: 下表显示了某周某位玩家的GID:

GID PLAYER
3467    AB-2000
3460    AB-2600
3463    AB-3500
3467    AC-0100
3458    AD-0100
3461    AF-0200

How do I go about keeping only the values in the covariance matrix when the two players have the same GID (for instance, players AB-2000 and AC-0100)? 当两个播放器具有相同的GID(例如,播放器AB-2000和AC-0100)时,如何只将值保留在协方差矩阵中?

Thanks for the help! 谢谢您的帮助!

I think this does what you're asking, if I'm interpreting the question correctly. 如果我正确地解释了问题,我认为这符合您的要求。 I've given you a couple solutions, pick your poison. 我给了你一些解决方案,选择你的毒药。 The first relies on a nested for loop which could be slow and further optimized if you knew for sure your matrix was symmetric. 第一个依赖于嵌套的for循环,如果您确定矩阵是对称的,则嵌套循环可能会很慢并且可以进一步优化。

m <- read.table(header=T, stringsAsFactors=F, text="
AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200
AB-2000 6.5 NA  -1.8    3.65    -17.96  -26.5
AB-2600 NA  7.18    NA  NA  NA  NA
AB-3500 -1.79   NA  5.4 NA  -4.63   NA
AC-0100 3.65    NA  NA  4.22    9.8 NA
AD-0100 -17.96  NA  -4.63   9.8 5.9 NA
AF-0200 -26.5   NA  NA  NA  NA  4.28
")

p <- read.table(header=T, stringsAsFactors=F, text="
GID PLAYER
3467    AB-2000
3460    AB-2600
3463    AB-3500
3467    AC-0100
3458    AD-0100
3461    AF-0200
")

m_t2 <- cm
names(m_t2) <- row.names(m_t2)

##  Replace names with GID:
row_names <- p$GID[which(p$PLAYER == row.names(m_t2))]
col_names <- p$GID[which(p$PLAYER == names(m_t2))]
for (i in 1:nrow(m_t2)) {
  m_t2[i, col_names != row_names[i]] <- NA
}

m_t2 <- as.matrix(m_t2)

Alternatively this solution does relies on the tidyr and dplyr packages but it should be quite efficient for very large datasets: 另外,此解决方案确实依赖于tidyrdplyr软件包,但是对于非常大的数据集,它应该非常有效:

m <- cm
names(m) <- row.names(m)
m$row_names <- row.names(m)

library(tidyr)
library(dplyr)

d <- m %>% 
  gather(col_names, "cv", -row_names, convert=T) %>% 
  left_join(p, by = c("row_names" = "PLAYER")) %>% 
  mutate(GID_row = GID) %>% 
  select(-GID) %>% 
  left_join(p, by=c("col_names" = "PLAYER")) %>% 
  mutate(GID_col = GID) %>% 
  mutate(new_cv = ifelse((GID_row == GID_col), cv, NA)) %>%
  select(row_names, col_names, new_cv) %>% 
  spread(col_names, new_cv)

m_t <- as.matrix(d[,-1])
row.names(m_t) <- d[["row_names"]]

The solution in either case looks like this: 两种情况下的解决方案如下所示:

> m_t
        AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200
AB-2000    6.50      NA      NA    3.65      NA      NA
AB-2600      NA    7.18      NA      NA      NA      NA
AB-3500      NA      NA     5.4      NA      NA      NA
AC-0100    3.65      NA      NA    4.22      NA      NA
AD-0100      NA      NA      NA      NA     5.9      NA
AF-0200      NA      NA      NA      NA      NA    4.28

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM