简体   繁体   English

如何在 R 中的 dataframe 中创建具有循环的邻接矩阵

[英]How to create an Adjacency matrices with loop within a dataframe in R

I have a dataframe with characteristics of cases and one column that indicate their category (V4).我有一个 dataframe 具有案例特征和一列指示其类别(V4)。 Additionally, I added all the possible categories as columns.此外,我将所有可能的类别添加为列。

V1  V2  V3  V4  A  B  C  D  E  F
X   X   X   A   
X   X   X   B
X   X   X   B
X   X   X   C

ow I want a "1" for all rows that match the column name. ow 我想要与列名匹配的所有行的“1”。 eg:例如:

V1  V2  V3  V4  A  B  C  D  E  F
X   X   X   A   1 
X   X   X   B      1
X   X   X   B      1
X   X   X   C         1

My solution was to loop with the if function:我的解决方案是使用if function 循环:

> for (k in 1:dim(label_com) [1]) {
+   if (label_com$cat [k] == colnames(label_com) [k]){
+     colnames(label_com) [k] <- gsub(pattern = NA[k], replacement = label_com[j, k], x = 1, perl = FALSE, fixed = TRUE, useBytes = FALSE)
+   }
+ }
Error in if (label_com$cat[k] == colnames(label_com)[k]) { : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In gsub(pattern = NA[k], replacement = label_com[j, k], x = 1, perl = FALSE,  :
  argument 'replacement' has length > 1 and only the first element will be used

I never worked with loops and now wonder how to solve this problem.我从来没有使用过循环,现在想知道如何解决这个问题。 There are multiple things needed:有很多东西需要:

  1. There needs to be an if when V4 == Columnname当 V4 == Columnname 时需要有一个if
  2. if true, it needs to give a "1" at the row of V4 and the column of the matching name如果为真,则需要在 V4 的行和匹配名称的列处给出“1”
  3. it needs a loop function to do this for al V4 values against all column names它需要一个loop function来针对所有列名对所有 V4 值执行此操作

I appreciate your help我感谢您的帮助

ps: my apologies if the question is wrongly asked or miss formatted, its my first question on the form... ps:如果问题被错误地提出或格式错误,我深表歉意,这是我在表格上的第一个问题......

Here is a framework for inserting the 1's:这是插入 1 的框架:

m <- matrix(0, nrow = 4, ncol = 10)
df <- as.data.frame(m)  # create dummy dataframe with example dimensions
LETTERS # base R function
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
V4 <- c('A', 'B', 'B', 'C') # df$V4 would replace this

offset <- sapply(V4, function(x) which(x == LETTERS)) # determine the offset indexes for V4
offset <- offset + 4 # add the first 4 columns of the df to the offset
offset
A B B C 
5 6 6 7 

for (i in 1:4){   # add the 1's to the dataframe
      df[i, offset[i]] <- 1
}

df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  0  0  0  0  1  0  0  0  0   0
2  0  0  0  0  0  1  0  0  0   0
3  0  0  0  0  0  1  0  0  0   0
4  0  0  0  0  0  0  1  0  0   0

You could replace the hard index values with functional derivations to generalize.您可以用函数派生替换硬索引值以进行概括。

I think you can get by without a loop here.我认为你可以在这里没有循环。

First, make sure your column V4 is a factor:首先,确保您的列V4是一个因素:

df$V4 <- as.factor(df$V4)
df$V4

[1] A B B C
Levels: A B C

Then, you can use model.matrix to create a model/design matrix based on this factor:然后,您可以使用model.matrix基于此因素创建模型/设计矩阵:

newdf <- cbind(df, model.matrix(~ V4 + 0, data = df))
newdf

  V1 V2 V3 V4 V4A V4B V4C
1  X  X  X  A   1   0   0
2  X  X  X  B   0   1   0
3  X  X  X  B   0   1   0
4  X  X  X  C   0   0   1

In model.matrix , you provide the data frame df as the data source, and a formula.model.matrix中,您提供数据框df作为数据源和一个公式。 By including + 0 in the formula, then the model won't include an intercept with all 1s.通过在公式中包含+ 0 ,model 将不包含全 1 的截距。 This keeps all of the factor levels in the model (without removing the first as intercept).这保留了 model 中的所有因子水平(不删除第一个作为截距)。

If you want to remove the V4 from the column names "A", "B", and "C" in this example, you could then do:如果您想在此示例中从列名“A”、“B”和“C”中删除V4 ,则可以执行以下操作:

idx <- match(paste0("V4", levels(df$V4)), names(newdf))
names(newdf)[idx] <- levels(df$V4)
newdf

  V1 V2 V3 V4 A B C
1  X  X  X  A 1 0 0
2  X  X  X  B 0 1 0
3  X  X  X  B 0 1 0
4  X  X  X  C 0 0 1

This allows you to create an index of where the column names match the levels of V4 preceded by the string "V4" and replace with just the levels themselves.这允许您创建列名与字符串“V4”前面的V4级别匹配的索引,并仅替换为级别本身。

My colleague came with an alternative including loops:我的同事提出了一个替代方案,包括循环:

for (i in 1:dim(label_com)[1]) {  for (j in 1:dim(label_com)[2]) { 
  if (label_com$cat[i] == colnames(label_com)[j]) {
    label_com[i,j] <- "1"
  }
}

}` }`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM