[英]Add/match rows with NA to matrix based on missing unique IDs
我正在使用面板数据集,并打算将其作为使用 SAOM 的动态 affiliation.network model。 不幸的是,数据非常混乱,处理起来很痛苦。
我已经设法为每个面板波创建邻接矩阵。 然而,随着时间的推移,该小组的规模越来越大/人们离开了。 根据检查 R 中的对象时出现的唯一 ID,我需要每个矩阵中的行数相同且顺序相同。所有“添加的 ID”应在整行中显示 10s。
这是一个可重现的例子,它应该使问题变得清晰,并显示我的目标。 我认为这可以通过巧妙地使用 merge() function 来解决,但我无法让它工作:
wave1 <- matrix(c(0,0,1,1,0,1,1,0,1,1), nrow = 5, ncol = 2, dimnames = list(c("1","2","4","5","9"), c("group1","group2")))
wave2 <- matrix(c(0,1,1,0,1,0,1,1), nrow = 4, ncol = 2, dimnames = list(c("1","4","8","9"), c("group1","group2")))
wave1_c <- matrix(c(0,0,1,1,10,0,1,1,0,0,10,1), nrow = 6, ncol = 2, dimnames = list(c("1","2","4","5","8","9"), c("group1","group2")))
wave2_c <- matrix(c(0,10,1,10,1,0,1,10,0,10,1,1), nrow = 6, ncol = 2, dimnames = list(c("1","2","4","5","8","9"), c("group1","group2")))
提前致谢。 除了 10 之外,矩阵中的数字是任意的。
base R 中的解决方案使用数据帧和合并。
合并和外部连接。
dwave1_c <- merge(wave1, wave2, by = 'row.names', all = TRUE, suffixes="")[2:3]
dwave2_c <- merge(wave2, wave1, by = 'row.names', all = TRUE, suffixes="")[2:3]
dwave1_c[is.na(dwave1_c)] <- 10
dwave2_c[is.na(dwave2_c)] <- 10
as.matrix(dwave1_c)
as.matrix(dwave2_c)
更新。
both <- merge(wave1, wave2, by = 'row.names', all = TRUE)
Output。
Row.names group1.x group2.x group1.y group2.y
1 1 0 1 0 1
2 2 0 1 NA NA
3 4 1 0 1 0
4 5 1 1 NA NA
5 8 NA NA 1 1
6 9 0 1 0 1
dwave1_c <- both[,2:3]; colnames(dwave1_c) <- colnames(wave1)
dwave2_c <- both[,4:5]; colnames(dwave2_c) <- colnames(wave2)
dwave1_c[is.na(dwave1_c)] <- 10
dwave2_c[is.na(dwave2_c)] <- 10
显示结果。
as.matrix(dwave1_c)
as.matrix(dwave2_c)
第一次尝试。
## Convert matrix to dataframe.
df1 <- as.data.frame(wave1)
df2 <- as.data.frame(wave2)
## Merge df1 and df2 by row name.
m_df1_df2 <- merge(df1, df2, by = 'row.names', all = TRUE)
rownames(m_df1_df2) <- m_df1_df2$Row.names
# Rows not in df1, but in df2,
# rows not in df2, but in df1
not1_2 <- m_df1_df2[is.na(m_df1_df2$group1.x),][c("group1.x", "group2.x")] # not in df1, in df2
not2_1 <- m_df1_df2[is.na(m_df1_df2$group1.y),][c("group1.y", "group2.y")] # not in df2, in df1
## Same column names.
colnames(not1_2) <- colnames(df1)
colnames(not2_1) <- colnames(df2)
## append
df1_c <- rbind(df1, not1_2)
df2_c <- rbind(df2, not2_1)
## order by row name
df1_c <- df1_c[order(row.names(df1_c)), ]
df2_c <- df2_c[order(row.names(df2_c)), ]
## replace NA by 10
df1_c[is.na(df1_c)] <- 10
df2_c[is.na(df2_c)] <- 10
as.matrix(df1_c)
as.matrix(df2_c)
我第一次尝试将 wave1,2 转换为数据帧是多余的,可以省略。 但是以隐式强制为代价。
## merge wave1 and wave2 by row name.
m_df1_df2 <- merge(wave1, wave2, by = 0, all = TRUE)
rownames(m_df1_df2) <- m_df1_df2$Row.names
# rows not in set 1, but in set 2,
# rows not in set 2, but in set 1.
not1_2 <- m_df1_df2[is.na(m_df1_df2$group1.x),][c("group1.x", "group2.x")]
not2_1 <- m_df1_df2[is.na(m_df1_df2$group1.y),][c("group1.y", "group2.y")]
## Same column names.
colnames(not1_2) <- colnames(wave1)
colnames(not2_1) <- colnames(wave2)
## append.
wave1_c <- rbind(wave1, not1_2)
wave2_c <- rbind(wave2, not2_1)
## order by row name.
wave1_c <- wave1_c[order(row.names(wave1_c)), ]
wave2_c <- wave2_c[order(row.names(wave2_c)), ]
## replace NA by 10.
wave1_c[is.na(wave1_c)] <- 10
wave2_c[is.na(wave2_c)] <- 10
## show result.
wave1_c
wave2_c
使用 setdiff 的解决方案。
## rownames not in set 1, but in set 2,
## rownames not in set 2, but in set 1.
rn_not2_1 <- setdiff(rownames(wave1), rownames(wave2))
rn_not1_2 <- setdiff(rownames(wave2), rownames(wave1))
## missing rows to add.
add_to_1 <- wave2[rn_not1_2,,drop=FALSE]
add_to_2 <- wave1[rn_not2_1,,drop=FALSE]
add_to_1[,] <- 10
add_to_2[,] <- 10
## append.
wave1_c <- rbind(wave1, add_to_1)
wave2_c <- rbind(wave2, add_to_2)
## order by row name.
wave1_c <- wave1_c[order(row.names(wave1_c)), ]
wave2_c <- wave2_c[order(row.names(wave2_c)), ]
## show result.
wave1_c
wave2_c
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.