简体   繁体   中英

Counting the number of factor variables in a data frame

After generating data, I combined 5 variables into a data frame. Two of those variables are factors.

Task: I want to count the number of variables in the data frame that are factors.

I ran the code letting df equal both a matrix and a data frame. I'm listing both error messages.

I need help in using rep function-where it's located in the R command in particular. Is using the count function the correct approach here and if not what should I do?

Can you help with this, please. Thank you. MM

XXX's mark questions in the output

> df
           var1 var2       var3 var4       var5
[1,] -1.2070657    1 -0.6319780    3 -0.9952502
[2,]  0.2774292    2  0.3485368    1  1.9176811
[3,]  1.0844412    3  0.2075986    2  0.8032506
> class(df)
[1] "matrix"

> library(plyr)
> count(df[1:5,],as.factor)
Error in df[1:5, ] : subscript out of bounds
> df
           var1 var2       var3 var4       var5
[1,] -1.2070657    1 -0.6319780    3 -0.9952502
[2,]  0.2774292    2  0.3485368    1  1.9176811
[3,]  1.0844412    3  0.2075986    2  0.8032506
> #Error in df[1:5, ] : subscript out of bounds  df=matrix
no applicable method for 'as.quoted' applied to 
an object of class "function" df=dataframe
                                            XXXXXXXXXXXXXXXXXXX

> #2]
> 
> #working example
> b=c(1,2,3,4,5,3,6)
> #Let’s count the 3s in the vector b.
> count3 <- length(which(b == 3))
> count3
[1] 2

> 
> #apply the technique
> vec=c("var1","var2","var3","var4","var5")
> countF <- length(which(var1==as.factor))
Error in var1 == as.factor : 
  comparison (1) is possible only for atomic and list types  XXXXXXXX

> #apply the technique again
> #count the number of variables that are factors in vec
> #var2 and var4 are factors
> vec=c("var1","var2","var3","var4","var5")
> countF <- length(which(vec==as.factor))
Error in vec == as.factor : 
  comparison (1) is possible only for atomic and list types
                                            XXXXXXXXXXXXXXXXXXX

I had changed columns 2 and 4 to be factors prior to cbinding but in that process columns 2 and 4 reverted back to being numeric. I used as.factor trying to get the code to run. As I read over comments I wondered why lapply would not be appropriate since were dealing with an array of variable names in a list. Do all of the apply functions return TRUE's or FALSE's? I'm still learning when to apply each of them.

MM

If you want to count the number of factor variables, you can use sapply combined with is.factor :

sum(sapply(df, is.factor))

where df is your target data frame.

A few problems here:

Your subscript is out of bounds problem is because df[1:5, ] is rows 1:5, whereas columns would be df[,1:5] . It appears that you only have 3 rows, not 5.

The second error no applicable method for 'as.quoted' applied to an object of class "function" is referring to the as.factor, which is a function. It is saying that a function doesn't belong within the function count . You can check exactly what count wants by running ?count in the console

A third problem that I see is that R will not automatically think that integers are factors. You will have to specify this with numbers. If you read in words, they are often automatically set as factors.

Here is a reproducible example:

> df<-data.frame("var1"=rnorm(3),"var2"=c(1:3),"var3"=rnorm(3),"var4"=c(3,1,2),"var5"=rnorm(3))
> str(df)

'data.frame':   3 obs. of  5 variables:
 $ var1: num  0.716 1.43 -0.726
 $ var2: int  1 2 3
 $ var3: num  0.238 -0.658 0.492
 $ var4: num  3 1 2
 $ var5: num  1.71 1.54 1.05

Here I used the structure str() function to check what type of data I have. Note, var1 is read in as an integer when I generated it as c(1:3) , whereas specifying c(3,1,2) was read in as numeric in var4

Here, I will tell R I want two of the columns to be factors, and I will make another column of words, which will automatically become factors.

> df<-data.frame("var1"=rnorm(3),"var2"=as.factor(c(1:3)),"var3"=rnorm(3),"var4"=as.factor(c(3,1,2))
+                ,"var5"=rnorm(3), "var6"=c("Green","Red","Blue"))
> str(df)
'data.frame':   3 obs. of  6 variables:
 $ var1: num  -1.18 1.26 -0.53
 $ var2: Factor w/ 3 levels "1","2","3": 1 2 3
 $ var3: num  1.38 -0.401 -0.924
 $ var4: Factor w/ 3 levels "1","2","3": 3 1 2
 $ var5: num  1.688 0.547 0.727
 $ var6: Factor w/ 3 levels "Blue","Green",..: 2 3 1

You can then as which are factors:

> sapply(df, is.factor)
 var1  var2  var3  var4  var5  var6 
FALSE  TRUE FALSE  TRUE FALSE  TRUE 

And if you wanted a number for how many are factors something like this would get you there:

> length(which(sapply(df, is.factor)==TRUE))
[1] 3

You have something similar: length(which(vec==as.factor)) , but one problem with this is you are asking which things in the vec object are the same as a function as.factor , which doesn't make sense. So it is giving you the error Error in vec == as.factor: comparison (1) is possible only for atomic and list types

as.factor is for setting things as factor (as I have shown above), but is.factor is for asking if something is a factor, which will return a logical (TRUE vs FALSE) - also shown above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM