简体   繁体   English

根据缺少的唯一 ID 将具有 NA 的行添加/匹配到矩阵

[英]Add/match rows with NA to matrix based on missing unique IDs

I am using a panel data set and intent to model this as a dynamic affiliation.network using SAOMs.我正在使用面板数据集,并打算将其作为使用 SAOM 的动态 affiliation.network model。 The data is unfortunately very messy and a pain to deal with.不幸的是,数据非常混乱,处理起来很痛苦。

I have managed to create adjacency matrices for each panel wave.我已经设法为每个面板波创建邻接矩阵。 However, over time the panel grew in size / people left.然而,随着时间的推移,该小组的规模越来越大/人们离开了。 I need the number of rows in each matrix to be the same and in the same order according to the unique IDs, which are present when inspecting the objects in R. All "added IDs" should show 10s across the whole row.根据检查 R 中的对象时出现的唯一 ID,我需要每个矩阵中的行数相同且顺序相同。所有“添加的 ID”应在整行中显示 10s。

Here is a reproducible example that should make the issue clear and also shows what I aim for.这是一个可重现的例子,它应该使问题变得清晰,并显示我的目标。 I assume this can be solved by smart use of the merge() function, but I could not get it to work:我认为这可以通过巧妙地使用 merge() function 来解决,但我无法让它工作:

wave1 <- matrix(c(0,0,1,1,0,1,1,0,1,1), nrow = 5, ncol = 2, dimnames = list(c("1","2","4","5","9"), c("group1","group2")))
wave2 <- matrix(c(0,1,1,0,1,0,1,1), nrow = 4, ncol = 2, dimnames = list(c("1","4","8","9"), c("group1","group2")))

wave1_c <- matrix(c(0,0,1,1,10,0,1,1,0,0,10,1), nrow = 6, ncol = 2, dimnames = list(c("1","2","4","5","8","9"), c("group1","group2")))
wave2_c <- matrix(c(0,10,1,10,1,0,1,10,0,10,1,1), nrow = 6, ncol = 2, dimnames = list(c("1","2","4","5","8","9"), c("group1","group2")))

Thanks in advance.提前致谢。 Numbers in the matrices are arbitrary except for the 10s.除了 10 之外,矩阵中的数字是任意的。

Solution in base R using dataframes and merge. base R 中的解决方案使用数据帧和合并。

Merge and outer join.合并和外部连接。

dwave1_c <- merge(wave1, wave2, by = 'row.names', all = TRUE, suffixes="")[2:3]
dwave2_c <- merge(wave2, wave1, by = 'row.names', all = TRUE, suffixes="")[2:3]
dwave1_c[is.na(dwave1_c)] <- 10
dwave2_c[is.na(dwave2_c)] <- 10

as.matrix(dwave1_c)
as.matrix(dwave2_c)

Update.更新。

both <- merge(wave1, wave2, by = 'row.names', all = TRUE)

Output. Output。

   Row.names group1.x group2.x group1.y group2.y
 1         1        0        1        0        1
 2         2        0        1       NA       NA
 3         4        1        0        1        0
 4         5        1        1       NA       NA
 5         8       NA       NA        1        1
 6         9        0        1        0        1

dwave1_c <- both[,2:3]; colnames(dwave1_c) <- colnames(wave1)
dwave2_c <- both[,4:5]; colnames(dwave2_c) <- colnames(wave2)
dwave1_c[is.na(dwave1_c)] <- 10
dwave2_c[is.na(dwave2_c)] <- 10

Show result.显示结果。

as.matrix(dwave1_c)
as.matrix(dwave2_c)

First try.第一次尝试。

## Convert matrix to dataframe.
df1 <- as.data.frame(wave1)
df2 <- as.data.frame(wave2)

## Merge df1 and df2 by row name.
m_df1_df2 <- merge(df1, df2, by = 'row.names', all = TRUE)
rownames(m_df1_df2) <- m_df1_df2$Row.names

# Rows not in df1, but in df2,
# rows not in df2, but in df1
not1_2 <- m_df1_df2[is.na(m_df1_df2$group1.x),][c("group1.x", "group2.x")] # not in df1, in df2
not2_1 <- m_df1_df2[is.na(m_df1_df2$group1.y),][c("group1.y", "group2.y")] # not in df2, in df1

## Same column names.   
colnames(not1_2) <- colnames(df1)
colnames(not2_1) <- colnames(df2)

## append
df1_c <- rbind(df1, not1_2)
df2_c <- rbind(df2, not2_1)

## order by row name
df1_c <- df1_c[order(row.names(df1_c)), ]
df2_c <- df2_c[order(row.names(df2_c)), ]

## replace NA by 10
df1_c[is.na(df1_c)] <- 10
df2_c[is.na(df2_c)] <- 10
as.matrix(df1_c)
as.matrix(df2_c)

The conversion of wave1,2 to data frames in my first attempt is redundant and can be omitted.我第一次尝试将 wave1,2 转换为数据帧是多余的,可以省略。 However at the expense of implicit coercions.但是以隐式强制为代价。

## merge wave1 and wave2 by row name.
m_df1_df2 <- merge(wave1, wave2, by = 0, all = TRUE)
rownames(m_df1_df2) <- m_df1_df2$Row.names

# rows not in set 1, but in set 2,
# rows not in set 2, but in set 1.
not1_2 <- m_df1_df2[is.na(m_df1_df2$group1.x),][c("group1.x", "group2.x")]
not2_1 <- m_df1_df2[is.na(m_df1_df2$group1.y),][c("group1.y", "group2.y")]

## Same column names. 
colnames(not1_2) <- colnames(wave1)
colnames(not2_1) <- colnames(wave2)

## append.
wave1_c <- rbind(wave1, not1_2)
wave2_c <- rbind(wave2, not2_1)

## order by row name.
wave1_c <- wave1_c[order(row.names(wave1_c)), ]
wave2_c <- wave2_c[order(row.names(wave2_c)), ]

## replace NA by 10.
wave1_c[is.na(wave1_c)] <- 10
wave2_c[is.na(wave2_c)] <- 10

## show result.
wave1_c
wave2_c

Solution using setdiff.使用 setdiff 的解决方案。

## rownames not in set 1, but in set 2,
## rownames not in set 2, but in set 1.
rn_not2_1 <- setdiff(rownames(wave1), rownames(wave2))
rn_not1_2 <- setdiff(rownames(wave2), rownames(wave1))

## missing rows to add.
add_to_1 <- wave2[rn_not1_2,,drop=FALSE]
add_to_2 <- wave1[rn_not2_1,,drop=FALSE]
add_to_1[,] <- 10
add_to_2[,] <- 10

## append.
wave1_c <- rbind(wave1, add_to_1)
wave2_c <- rbind(wave2, add_to_2)

## order by row name.
wave1_c <- wave1_c[order(row.names(wave1_c)), ]
wave2_c <- wave2_c[order(row.names(wave2_c)), ]

## show result.
wave1_c
wave2_c

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM