简体   繁体   中英

How to change factor arguments? (R)

Here are three vectors.

vec1 <- 1:6
vec2 <- c('radio', 'newspaper', 'web-page', 'chat', 'tv', 'web-page')
vec3 <- c(0, 0, 1, 1, 0, 1)

The task is to form a data frame with the following structure using these vectors.

'data.frame': 6 obs. of 3 variables:  
$ id : int 1 2 3 4 5 6
$ response: Factor w/ 2 levels "No","Yes": 1 1 2 2 1 2
$ medium : chr "radio" "newspaper" "web-page" "chat" ... 

Here is my solution.

dfr <- data.frame(id = vec1, response = vec3, medium = vec2, stringsAsFactors = FALSE) 
dfr$response <- factor(x = , levels = , labels = )

My question is: "What values should the arguments (x, levels, labels) have and why?" Talking about this line:

dfr$response <- factor(x = , levels = , labels = )

We can assign labels to vec3 as levels are by default taken from unique values of vec3 .

df <- data.frame(id = vec1, response = factor(vec3, labels = c('No', 'Yes')), 
                  medium = vec2, stringsAsFactors = FALSE)

str(df)
#'data.frame':  6 obs. of  3 variables:
#$ id      : int  1 2 3 4 5 6
#$ response: Factor w/ 2 levels "No","Yes": 1 1 2 2 1 2
#$ medium  : chr  "radio" "newspaper" "web-page" "chat" ...

You can read ?factor for more details.

In this:

x is the vector of data that you want to turn into a factor, in this case the responses x=df$response

Levels is a vector of values that x might have taken. The default is a list of the distinct values of x, in ascending order (numeric or alphabetical), so the default would be c(0, 1) . You don't need to include the levels, as it will automatically detect them, however as you're adding labels then it's good practice to add the levels so your labels match up in case you have lots of levels and manage to get the order mixed up.

Labels can either be a single string or a vector of all labels for the levels, you can use labels to map multiple values to the same Label. For your task you would use c("No", "Yes") . the default for Labels is the levels ie no label.

So your final code will be

dfr$response <- factor(x=dfr$response, levels=c(0,1), labels=c("No", "Yes"))

As a minor aside, people generally use df to represent a data frame, rather than dfr. It doesn't make any difference, but is just the commonly used notation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM