I am working on my first real project within R and ran into a problem. I am trying to compare 2 columns within 2 different data.frames. I tried running the code,
matrix1 = matrix
for (i in 1:2000){
if(data.QW[i,1] == data.RS[i,1]){
matrix1[i,1]== "True"
}
else{
matrix1[i,1]== "False"
}
}
I got this error:
Error in Ops.factor(data.QW[i,1], data.RS[i,1]) :
level sets of factors are different
I think this may be because QW and RS have different row lengths. But I am trying to see where these errors might be within the different data.frames and fix them according to the source document.
I am also unsure if matrix will work for this or if I need to make it into a vector and rbind it into the matrix every time.
Any good readings on this would also be appreciated.
As mentioned in the comments, providing a reproducible example with the contents of the dataframe will be helpful.
Going by how the question topic sounds, it appears that you want to compare column 1 of data frame A against column 1 of data frame B and store the result in a logical vector. If that summary is accurate, please take a look here .
Too long for a comment.
Some observations:
data.QW[,1]
and data.RS[,1]
are almost certainly factors. ==
will not work. read.csv(...)
any columns containing character data were converted to factors by default. You can change that behavior by setting stringsAsFactors=FALSE
in the call to read.csv(...)
. This is a very common problem. data.QW[1:2000,1]==data.RW[1:2000,1]
. This will create a vector of length 2000 containing all the comparisons. No loop needed. Of course this assumes that both data.frames have at least 2000 rows. Here's an example of item 2:
x <- as.factor(rep(LETTERS[1:5],3)) # has levels: A, B, C, D, E
y <- as.factor(rep(LETTERS[1:3],5)) # has levels: A, B, C
y==x
# Error in Ops.factor(y, x) : level sets of factors are different
The below function compare
compares data.frames or matrices a,b
to find row matches of a
in b
. It returns the first row position in b
which matches (after some internal sorting required to speed thinks up). Rows in a
which have no match in b
will have a return value of 0
. Should handle numeric, character and factor column types and mixtures thereof (the latter for data.frames
only). Check the example below the function definition.
compare<-function(a,b){
#################################################
if(dim(a)[2]!=dim(b)[2]){
stop("\n Matrices a and b have different number of columns!")
}
if(!all(sapply(a, class)==sapply(b, class))){
stop("\n Matrices a and b have incomparable column data types!")
}
#################################################
if(is.data.frame(a)){
i <- sapply(a, is.factor)
a[i] <- lapply(a[i], as.character)
}
if(is.data.frame(b)){
i <- sapply(b, is.factor)
b[i] <- lapply(b[i], as.character)
}
len1<-dim(a)[1]
len2<-dim(b)[1]
ord1<-do.call(order,as.data.frame(a))
a<-a[ord1,]
ord2<-do.call(order,as.data.frame(b))
b<-b[ord2,]
#################################################
found<-rep(0,len1)
dims<-dim(a)[2]
do_dims<-c(1:dim(a)[2])
at<-1
for(i in 1:len1){
for(m in do_dims){
while(b[at,m]<a[i,m]){
at<-(at+1)
if(at>len2){break}
}
if(at>len2){break}
if(b[at,m]>a[i,m]){break}
if(m==dims){found[i]<-at}
}
if(at>len2){break}
}
#################################################
found<-found[order(ord1)]
found<-ord2[found]
return(found)
}
# example data sets:
ncols<-10
nrows<-1E4
a <- matrix(sample(LETTERS,size = (ncols*nrows), replace = T), ncol = ncols, nrow = nrows)
b <- matrix(sample(LETTERS,size = (ncols*nrows), replace = T), ncol = ncols, nrow = nrows)
b <- rbind(a,b) # example of b containing a
b <- b[sample(dim(b)[1],dim(b)[1],replace = F),]
found<-compare(a,b)
a<-as.data.frame(a) # = conversion to factors
b<-as.data.frame(b) # = conversion to factors
found<-compare(a,b)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.