简体   繁体   中英

How to replace a certain value in one data.table with values of another data.table of same dimension

Given two data.table:

dt1 <- data.table(id = c(1,-99,2,2,-99), a = c(2,1,-99,-99,3), b = c(5,3,3,2,5), c = c(-99,-99,-99,2,5))
dt2 <- data.table(id = c(2,3,1,4,3),a = c(6,4,3,2,6), b = c(3,7,8,8,3), c = c(2,2,4,3,2))

> dt1
    id   a b   c
1:   1   2 5 -99
2: -99   1 3 -99
3:   2 -99 3 -99
4:   2 -99 2   2
5: -99   3 5   5

> dt2
   id a b c
1:  2 6 3 2
2:  3 4 7 2
3:  1 3 8 4
4:  4 2 8 3
5:  3 6 3 2

How can one replace the -99 of dt1 with the values of dt2?

Wanted results should be dt3:

> dt3
   id a b c
1:  1 2 5 2
2:  3 1 3 2
3:  2 3 3 4
4:  2 2 2 2
5:  3 3 5 5

You can do the following:

dt3 <- as.data.frame(dt1)
dt2 <- as.data.frame(dt2)
dt3[dt3 == -99] <- dt2[dt3 == -99]
dt3

#   id a b c
# 1  1 2 5 2
# 2  3 1 3 2
# 3  2 3 3 4
# 4  2 2 2 2
# 5  3 3 5 5

If your data is all of the same type (as in your example) then transforming them to matrix is a lot faster and transparent:

dt1a <- as.matrix(dt1)  ## convert to matrix
dt2a <- as.matrix(dt2)

# make a matrix of the same shape to access the right entries
missing_idx <- dt1a == -99  
dt1a[missing_idx] <- dt2a[missing_idx]  ## replace by reference

This is a vectorized operation, so it should be fast.

Note: If you do this make sure the two data sources match exactly in shape and order of rows/columns. If they don't then you need to join by the relevant keys and pick the correct columns.

EDIT: The conversion to matrix may be unnecessary. See kath's answer for a more terse solution.

This simple trick would work efficiently.

dt1<-as.matrix(dt1)
dt2<-as.matrix(dt2)

index.replace = dt1==-99
dt1[index.replace] = dt2[index.replace]

as.data.table(dt1)
as.data.table(dt2)

Simple way could be to use setDF function to convert to data.frame and use data frame sub-setting methods. Restore to data.table at the end.

#Change to data.frmae
setDF(dt1)
setDF(dt2)

# Perform assignment 
dt1[dt1==-99] = dt2[dt1==-99]

# Restore back to data.table    
setDT(dt1)
setDT(dt2)

dt1
#   id a b c
# 1  1 2 5 2
# 2  3 1 3 2
# 3  2 3 3 4
# 4  2 2 2 2
# 5  3 3 5 5

This should work, a simple approach:

  for (i in 1:nrow(dt1)){
    for (j in 1:ncol(dt1)){
    if (dt1[i,j] == -99) dt1[i,j] = dt2[i,j]
    }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM