Subsetting by postcode (levels of factors)

Question

I have a problem whereby I want to match the start postcode and end postcode of a very large survey dataset, and put these results in a new dataframe. I have created an example dataframe to use for the purpose of illustration.

ID = c(1,2,3,4,5) 
StartPC = c("AF2 4RE","AF3 5RE","AF1 3DR","AF2 4RE","AF2 4PE")
EndPC = c("AF2 4RE","NA","AF2 3DR","AX2 4RE","AF2 4PE")
data<-data.frame(ID,StartPC,EndPC)

data2 <- subset(data, StartPC==EndPC,na.rm=TRUE)

Using the above code, I want to create a dataframe (data2) which only includes the ID numbers whereby the start and end postcodes are the same. However, I get the error message:

Error in Ops.factor(StartPC, EndPC) : level sets of factors are different

The output needs just to have ID numbers 1 and 5 included in the new data table.

Answer 1

That will be because

 Error in Ops.factor(StartPC, EndPC) : level sets of factors are different

Your two columns are factors, not characters. Factors are categorical variables, which are stored as integers and a lookup-table of 'levels'. Comparing them is actually comparing the underlying integers, so R makes sure you are comparing factors with the same levels. If not, then it decides you are doing a bad thing.

So convert to character:

> subset(data, as.character(StartPC)==as.character(EndPC),na.rm=TRUE)
  ID StartPC   EndPC
1  1 AF2 4RE AF2 4RE
5  5 AF2 4PE AF2 4PE

either on the fly like that, or make your data frame with characters in the first place, or make sure both columns are made with the same levels.

Subsetting by postcode (levels of factors)

Question

1 answers

solution1
8 ACCPTED 2011-12-07 16:25:24

Subsetting by postcode (levels of factors)

Question

1 answers

solution1 8 ACCPTED 2011-12-07 16:25:24

solution1
8 ACCPTED 2011-12-07 16:25:24