简体   繁体   中英

Convert Number to Factor using Labels in R

I have a column in my dataset that has various different numeric values in it. However, 3 of the numbers have a specific label, while all others have a general label. Going through the dataset one by one is not an option. It is a very large dataset with 167K obs.

Below shows all the unique values that are in the column:

> unique(NYC_2019_Arrests$JURISDICTION_CODE)
Levels: 0 1 2 3 4 6 7 9 11 12 13 14 15 16 69 71 72 73 74 76 79 85 87 88 97

The levels of JURISDICTION_CODE are defined as follows:

JURISDICTION_CODE - Jurisdiction responsible for arrest. Jurisdiction codes 0(Patrol), 1(Transit) and 2(Housing) represent NYPD whilst codes 3 and more represent non NYPD jurisdictions.

This is the code that I tried to get it to work but just returns an error:

> NYC_2019_Arrests$JURISDICTION_CODE <- factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0,1,2, 3:100), labels = c("Patrol", "Transit", "Housing", "Non-NYPD Jurisdiction"))
Error in factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0, 1, 2,  : 
  invalid 'labels'; length 4 should be 1 or 101

I also tried the above code by taking out the 3:100 and leave in the label but that also did not work.

It would be greatly appreciated if anybody here would know how to make it that all values 3 and above has the generic without having to type out all of the numbers individually.

Thanks!

The error message is providing some direction. The problem is that the labels vector is of length 4 but your levels are length 101. I think you are almost there with the original code. Just make the labels to the correct length with:

reps<-rep("Non-NYPD Jurisdiction",98)
NYC_2019_Arrests$JURISDICTION_CODE <- factor(NYC_2019_Arrests$JURISDICTION_CODE, levels = c(0,1,2, 3:100), labels = c("Patrol", "Transit", "Housing", reps))

Edit with explanation:

Run this code for additional explanation.

#The key is that labels needs the same vector length as level

#length of levels
levels <- c(0,1,2, 3:100)
print(length(levels))
#length of original levels
labels = c("Patrol", "Transit", "Housing", "Non-NYPD Jurisdiction")
print(length(labels))
#This is problematic because what happens for when level - 4. labels[4] would be null.
#Therefore need to repeat "Non-NYPD Jurisdiction" for each level
#since length(3:100) is 98 that is how we know we need 98
reps<-rep("Non-NYPD Jurisdiction",98)
labels <- c("Patrol", "Transit", "Housing", reps)
print(length(labels))

There are several ways to solve this. The simplest and best way I can think of is to use case_when from dplyr Here is an example:

library(dplyr)

case_when(mtcars$carb == 1 ~ "One",
          mtcars$carb == 2 ~ "Two",
          mtcars$carb >= 3 ~ "Three or More")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM