简体   繁体   中英

R - How to change values in one Matrix based on elements in another Matrix

I have the following covariance matrix in R:

AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200
AB-2000 6.5 NA  -1.8    3.65    -17.96  -26.5
AB-2600 NA  7.18    NA  NA  NA  NA
AB-3500 -1.79   NA  5.4 NA  -4.63   NA
AC-0100 3.65    NA  NA  4.22    9.8 NA
AD-0100 -17.96  NA  -4.63   9.8 5.9 NA
AF-0200 -26.5   NA  NA  NA  NA  4.28

Each column and row corresponds to a football player (ie, AB-2000). So the intersection of AB-2000, AB-2000 gives the variance for that players performance. A row like AB-2000, AF-0200 gives the covariance of two players performance.

Currently, the matrix shows all covariance values. However, not all covariance values matter. In fact, the only ones that matter are when two players are playing the same game that week (in this case, have the same game ID (GID)).

The following table shows the GID for a PLAYER on certain week:

GID PLAYER
3467    AB-2000
3460    AB-2600
3463    AB-3500
3467    AC-0100
3458    AD-0100
3461    AF-0200

How do I go about keeping only the values in the covariance matrix when the two players have the same GID (for instance, players AB-2000 and AC-0100)?

Thanks for the help!

I think this does what you're asking, if I'm interpreting the question correctly. I've given you a couple solutions, pick your poison. The first relies on a nested for loop which could be slow and further optimized if you knew for sure your matrix was symmetric.

m <- read.table(header=T, stringsAsFactors=F, text="
AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200
AB-2000 6.5 NA  -1.8    3.65    -17.96  -26.5
AB-2600 NA  7.18    NA  NA  NA  NA
AB-3500 -1.79   NA  5.4 NA  -4.63   NA
AC-0100 3.65    NA  NA  4.22    9.8 NA
AD-0100 -17.96  NA  -4.63   9.8 5.9 NA
AF-0200 -26.5   NA  NA  NA  NA  4.28
")

p <- read.table(header=T, stringsAsFactors=F, text="
GID PLAYER
3467    AB-2000
3460    AB-2600
3463    AB-3500
3467    AC-0100
3458    AD-0100
3461    AF-0200
")

m_t2 <- cm
names(m_t2) <- row.names(m_t2)

##  Replace names with GID:
row_names <- p$GID[which(p$PLAYER == row.names(m_t2))]
col_names <- p$GID[which(p$PLAYER == names(m_t2))]
for (i in 1:nrow(m_t2)) {
  m_t2[i, col_names != row_names[i]] <- NA
}

m_t2 <- as.matrix(m_t2)

Alternatively this solution does relies on the tidyr and dplyr packages but it should be quite efficient for very large datasets:

m <- cm
names(m) <- row.names(m)
m$row_names <- row.names(m)

library(tidyr)
library(dplyr)

d <- m %>% 
  gather(col_names, "cv", -row_names, convert=T) %>% 
  left_join(p, by = c("row_names" = "PLAYER")) %>% 
  mutate(GID_row = GID) %>% 
  select(-GID) %>% 
  left_join(p, by=c("col_names" = "PLAYER")) %>% 
  mutate(GID_col = GID) %>% 
  mutate(new_cv = ifelse((GID_row == GID_col), cv, NA)) %>%
  select(row_names, col_names, new_cv) %>% 
  spread(col_names, new_cv)

m_t <- as.matrix(d[,-1])
row.names(m_t) <- d[["row_names"]]

The solution in either case looks like this:

> m_t
        AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200
AB-2000    6.50      NA      NA    3.65      NA      NA
AB-2600      NA    7.18      NA      NA      NA      NA
AB-3500      NA      NA     5.4      NA      NA      NA
AC-0100    3.65      NA      NA    4.22      NA      NA
AD-0100      NA      NA      NA      NA     5.9      NA
AF-0200      NA      NA      NA      NA      NA    4.28

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM