简体   繁体   中英

How to combine 2 dataframes on a condition?

I have 2 dataframes.

df1 -

     T1    T2    T3    T4
ID1  0     1.3   -1.5   0
ID2  0.05  0.3    0    -0.004

df2 -

     Value1    Value2    Value3    Value4    
T1   0         0         1          0    
T2   0         1         0          0 
T3   1         0         0          1 
T4   0         1         1          1 

Now, I want the following result:

     Value1    Value2    Value3    Value4 
ID1  1          1         0         1
ID2  0          2         2         1 

In the final result, I want to merge df1 and df2.

For example, In df1, ID1 row we have first value zero so we will ignore T1 in df2. We will skip to next value 1.3 in d1 and since this value is not equal to zero, we will take the T2 values frm df2 and put it in output table. Similarly for every cell.

See the data below. The one thing all solutions here require is that the row names are in a column of the data, so start with:

df1$ID <- rownames(df1)
df2$Tnum <- rownames(df2)

base R

# library(reshape2) # melt, dcast
df1m <- reshape2::melt(df1, id="ID", variable.name = "Tnum")
df2m <- reshape2::melt(df2, id="Tnum")
dfcomb <- merge(subset(df1m, abs(value) > 0), df2m, by = "Tnum", all = TRUE)
dfcomb2 <- aggregate(dfcomb$value.y, by = dfcomb[c("ID", "variable")], FUN = sum)
reshape2::dcast(dfcomb2, ID ~ variable)
# Using x as value column: use value.var to override.
#    ID Value1 Value2 Value3 Value4
# 1 ID1      1      1      0      1
# 2 ID2      0      2      2      1

tidyverse

library(dplyr)
library(tidyr) # pivot_longer, pivot_wider
left_join(
  pivot_longer(df1, -ID, names_to = "Tnum"),
  pivot_longer(df2, -Tnum),
  by = "Tnum"
) %>%
  filter(abs(value.x) > 0) %>%
  group_by(ID, name) %>%
  summarize(value = sum(value.y), .groups = "drop") %>%
  pivot_wider(ID)
# # A tibble: 2 x 5
#   ID    Value1 Value2 Value3 Value4
#   <chr>  <int>  <int>  <int>  <int>
# 1 ID1        1      1      0      1
# 2 ID2        0      2      2      1

data.table

library(data.table)
tmp <- merge(
  melt(DT1, id.vars = "ID", variable.name = "Tnum")[ abs(value) > 0 ],
  melt(DT2, id.vars = "Tnum"),
  by = "Tnum", allow.cartesian = TRUE
)[, .(value = sum(value.y)), by = .(ID, variable) ]
dcast(tmp, ID ~ variable)
#        ID Value1 Value2 Value3 Value4
#    <char>  <int>  <int>  <int>  <int>
# 1:    ID1      1      1      0      1
# 2:    ID2      0      2      2      1

Data

df1 <- structure(list(T1 = c(0, 0.05), T2 = c(1.3, 0.3), T3 = c(-1.5, 0), T4 = c(0, -0.004)), class = "data.frame", row.names = c("ID1", "ID2"))
df2 <- structure(list(Value1 = c(0L, 0L, 1L, 0L), Value2 = c(0L, 1L, 0L, 1L), Value3 = c(1L, 0L, 0L, 1L), Value4 = c(0L, 0L, 1L, 1L)), class = "data.frame", row.names = c("T1", "T2", "T3", "T4"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM