I have a dataframe with characteristics of cases and one column that indicate their category (V4). Additionally, I added all the possible categories as columns.
V1 V2 V3 V4 A B C D E F
X X X A
X X X B
X X X B
X X X C
ow I want a "1" for all rows that match the column name. eg:
V1 V2 V3 V4 A B C D E F
X X X A 1
X X X B 1
X X X B 1
X X X C 1
My solution was to loop with the if
function:
> for (k in 1:dim(label_com) [1]) {
+ if (label_com$cat [k] == colnames(label_com) [k]){
+ colnames(label_com) [k] <- gsub(pattern = NA[k], replacement = label_com[j, k], x = 1, perl = FALSE, fixed = TRUE, useBytes = FALSE)
+ }
+ }
Error in if (label_com$cat[k] == colnames(label_com)[k]) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In gsub(pattern = NA[k], replacement = label_com[j, k], x = 1, perl = FALSE, :
argument 'replacement' has length > 1 and only the first element will be used
I never worked with loops and now wonder how to solve this problem. There are multiple things needed:
if
when V4 == Columnnameloop function
to do this for al V4 values against all column namesI appreciate your help
ps: my apologies if the question is wrongly asked or miss formatted, its my first question on the form...
Here is a framework for inserting the 1's:
m <- matrix(0, nrow = 4, ncol = 10)
df <- as.data.frame(m) # create dummy dataframe with example dimensions
LETTERS # base R function
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
V4 <- c('A', 'B', 'B', 'C') # df$V4 would replace this
offset <- sapply(V4, function(x) which(x == LETTERS)) # determine the offset indexes for V4
offset <- offset + 4 # add the first 4 columns of the df to the offset
offset
A B B C
5 6 6 7
for (i in 1:4){ # add the 1's to the dataframe
df[i, offset[i]] <- 1
}
df
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 0 0 0 0 1 0 0 0 0 0
2 0 0 0 0 0 1 0 0 0 0
3 0 0 0 0 0 1 0 0 0 0
4 0 0 0 0 0 0 1 0 0 0
You could replace the hard index values with functional derivations to generalize.
I think you can get by without a loop here.
First, make sure your column V4
is a factor:
df$V4 <- as.factor(df$V4)
df$V4
[1] A B B C
Levels: A B C
Then, you can use model.matrix
to create a model/design matrix based on this factor:
newdf <- cbind(df, model.matrix(~ V4 + 0, data = df))
newdf
V1 V2 V3 V4 V4A V4B V4C
1 X X X A 1 0 0
2 X X X B 0 1 0
3 X X X B 0 1 0
4 X X X C 0 0 1
In model.matrix
, you provide the data frame df
as the data source, and a formula. By including + 0
in the formula, then the model won't include an intercept with all 1s. This keeps all of the factor levels in the model (without removing the first as intercept).
If you want to remove the V4
from the column names "A", "B", and "C" in this example, you could then do:
idx <- match(paste0("V4", levels(df$V4)), names(newdf))
names(newdf)[idx] <- levels(df$V4)
newdf
V1 V2 V3 V4 A B C
1 X X X A 1 0 0
2 X X X B 0 1 0
3 X X X B 0 1 0
4 X X X C 0 0 1
This allows you to create an index of where the column names match the levels of V4
preceded by the string "V4" and replace with just the levels themselves.
My colleague came with an alternative including loops:
for (i in 1:dim(label_com)[1]) { for (j in 1:dim(label_com)[2]) {
if (label_com$cat[i] == colnames(label_com)[j]) {
label_com[i,j] <- "1"
}
}
}`
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.