[英]How to create an Adjacency matrices with loop within a dataframe in R
I have a dataframe with characteristics of cases and one column that indicate their category (V4).我有一个 dataframe 具有案例特征和一列指示其类别(V4)。 Additionally, I added all the possible categories as columns.
此外,我将所有可能的类别添加为列。
V1 V2 V3 V4 A B C D E F
X X X A
X X X B
X X X B
X X X C
ow I want a "1" for all rows that match the column name. ow 我想要与列名匹配的所有行的“1”。 eg:
例如:
V1 V2 V3 V4 A B C D E F
X X X A 1
X X X B 1
X X X B 1
X X X C 1
My solution was to loop with the if
function:我的解决方案是使用
if
function 循环:
> for (k in 1:dim(label_com) [1]) {
+ if (label_com$cat [k] == colnames(label_com) [k]){
+ colnames(label_com) [k] <- gsub(pattern = NA[k], replacement = label_com[j, k], x = 1, perl = FALSE, fixed = TRUE, useBytes = FALSE)
+ }
+ }
Error in if (label_com$cat[k] == colnames(label_com)[k]) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In gsub(pattern = NA[k], replacement = label_com[j, k], x = 1, perl = FALSE, :
argument 'replacement' has length > 1 and only the first element will be used
I never worked with loops and now wonder how to solve this problem.我从来没有使用过循环,现在想知道如何解决这个问题。 There are multiple things needed:
有很多东西需要:
if
when V4 == Columnnameif
loop function
to do this for al V4 values against all column namesloop function
来针对所有列名对所有 V4 值执行此操作I appreciate your help我感谢您的帮助
ps: my apologies if the question is wrongly asked or miss formatted, its my first question on the form... ps:如果问题被错误地提出或格式错误,我深表歉意,这是我在表格上的第一个问题......
Here is a framework for inserting the 1's:这是插入 1 的框架:
m <- matrix(0, nrow = 4, ncol = 10)
df <- as.data.frame(m) # create dummy dataframe with example dimensions
LETTERS # base R function
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
V4 <- c('A', 'B', 'B', 'C') # df$V4 would replace this
offset <- sapply(V4, function(x) which(x == LETTERS)) # determine the offset indexes for V4
offset <- offset + 4 # add the first 4 columns of the df to the offset
offset
A B B C
5 6 6 7
for (i in 1:4){ # add the 1's to the dataframe
df[i, offset[i]] <- 1
}
df
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 0 0 0 0 1 0 0 0 0 0
2 0 0 0 0 0 1 0 0 0 0
3 0 0 0 0 0 1 0 0 0 0
4 0 0 0 0 0 0 1 0 0 0
You could replace the hard index values with functional derivations to generalize.您可以用函数派生替换硬索引值以进行概括。
I think you can get by without a loop here.我认为你可以在这里没有循环。
First, make sure your column V4
is a factor:首先,确保您的列
V4
是一个因素:
df$V4 <- as.factor(df$V4)
df$V4
[1] A B B C
Levels: A B C
Then, you can use model.matrix
to create a model/design matrix based on this factor:然后,您可以使用
model.matrix
基于此因素创建模型/设计矩阵:
newdf <- cbind(df, model.matrix(~ V4 + 0, data = df))
newdf
V1 V2 V3 V4 V4A V4B V4C
1 X X X A 1 0 0
2 X X X B 0 1 0
3 X X X B 0 1 0
4 X X X C 0 0 1
In model.matrix
, you provide the data frame df
as the data source, and a formula.在
model.matrix
中,您提供数据框df
作为数据源和一个公式。 By including + 0
in the formula, then the model won't include an intercept with all 1s.通过在公式中包含
+ 0
,model 将不包含全 1 的截距。 This keeps all of the factor levels in the model (without removing the first as intercept).这保留了 model 中的所有因子水平(不删除第一个作为截距)。
If you want to remove the V4
from the column names "A", "B", and "C" in this example, you could then do:如果您想在此示例中从列名“A”、“B”和“C”中删除
V4
,则可以执行以下操作:
idx <- match(paste0("V4", levels(df$V4)), names(newdf))
names(newdf)[idx] <- levels(df$V4)
newdf
V1 V2 V3 V4 A B C
1 X X X A 1 0 0
2 X X X B 0 1 0
3 X X X B 0 1 0
4 X X X C 0 0 1
This allows you to create an index of where the column names match the levels of V4
preceded by the string "V4" and replace with just the levels themselves.这允许您创建列名与字符串“V4”前面的
V4
级别匹配的索引,并仅替换为级别本身。
My colleague came with an alternative including loops:我的同事提出了一个替代方案,包括循环:
for (i in 1:dim(label_com)[1]) { for (j in 1:dim(label_com)[2]) {
if (label_com$cat[i] == colnames(label_com)[j]) {
label_com[i,j] <- "1"
}
}
}` }`
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.