简体   繁体   中英

How to create an Adjacency matrices with loop within a dataframe in R

I have a dataframe with characteristics of cases and one column that indicate their category (V4). Additionally, I added all the possible categories as columns.

V1  V2  V3  V4  A  B  C  D  E  F
X   X   X   A   
X   X   X   B
X   X   X   B
X   X   X   C

ow I want a "1" for all rows that match the column name. eg:

V1  V2  V3  V4  A  B  C  D  E  F
X   X   X   A   1 
X   X   X   B      1
X   X   X   B      1
X   X   X   C         1

My solution was to loop with the if function:

> for (k in 1:dim(label_com) [1]) {
+   if (label_com$cat [k] == colnames(label_com) [k]){
+     colnames(label_com) [k] <- gsub(pattern = NA[k], replacement = label_com[j, k], x = 1, perl = FALSE, fixed = TRUE, useBytes = FALSE)
+   }
+ }
Error in if (label_com$cat[k] == colnames(label_com)[k]) { : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In gsub(pattern = NA[k], replacement = label_com[j, k], x = 1, perl = FALSE,  :
  argument 'replacement' has length > 1 and only the first element will be used

I never worked with loops and now wonder how to solve this problem. There are multiple things needed:

  1. There needs to be an if when V4 == Columnname
  2. if true, it needs to give a "1" at the row of V4 and the column of the matching name
  3. it needs a loop function to do this for al V4 values against all column names

I appreciate your help

ps: my apologies if the question is wrongly asked or miss formatted, its my first question on the form...

Here is a framework for inserting the 1's:

m <- matrix(0, nrow = 4, ncol = 10)
df <- as.data.frame(m)  # create dummy dataframe with example dimensions
LETTERS # base R function
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
V4 <- c('A', 'B', 'B', 'C') # df$V4 would replace this

offset <- sapply(V4, function(x) which(x == LETTERS)) # determine the offset indexes for V4
offset <- offset + 4 # add the first 4 columns of the df to the offset
offset
A B B C 
5 6 6 7 

for (i in 1:4){   # add the 1's to the dataframe
      df[i, offset[i]] <- 1
}

df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  0  0  0  0  1  0  0  0  0   0
2  0  0  0  0  0  1  0  0  0   0
3  0  0  0  0  0  1  0  0  0   0
4  0  0  0  0  0  0  1  0  0   0

You could replace the hard index values with functional derivations to generalize.

I think you can get by without a loop here.

First, make sure your column V4 is a factor:

df$V4 <- as.factor(df$V4)
df$V4

[1] A B B C
Levels: A B C

Then, you can use model.matrix to create a model/design matrix based on this factor:

newdf <- cbind(df, model.matrix(~ V4 + 0, data = df))
newdf

  V1 V2 V3 V4 V4A V4B V4C
1  X  X  X  A   1   0   0
2  X  X  X  B   0   1   0
3  X  X  X  B   0   1   0
4  X  X  X  C   0   0   1

In model.matrix , you provide the data frame df as the data source, and a formula. By including + 0 in the formula, then the model won't include an intercept with all 1s. This keeps all of the factor levels in the model (without removing the first as intercept).

If you want to remove the V4 from the column names "A", "B", and "C" in this example, you could then do:

idx <- match(paste0("V4", levels(df$V4)), names(newdf))
names(newdf)[idx] <- levels(df$V4)
newdf

  V1 V2 V3 V4 A B C
1  X  X  X  A 1 0 0
2  X  X  X  B 0 1 0
3  X  X  X  B 0 1 0
4  X  X  X  C 0 0 1

This allows you to create an index of where the column names match the levels of V4 preceded by the string "V4" and replace with just the levels themselves.

My colleague came with an alternative including loops:

for (i in 1:dim(label_com)[1]) {  for (j in 1:dim(label_com)[2]) { 
  if (label_com$cat[i] == colnames(label_com)[j]) {
    label_com[i,j] <- "1"
  }
}

}`

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM