简体   繁体   中英

Assign classes based on frequencies of classes in rows of a data.frame in R

I ran 8 different classification models (classes "0", "1", "-1" meaning "neutral", "positive", "negative") and I'm trying to combine them. Finally the results should be added to my data.frame as additional columns. Now in excel for example, that wouldn't be too hard, but I just don't know how to do such a thing in R. Well, first my data.frame:

MAXENTROPY <- c("1","1","1","1","0","-1","-1","1","-1","0")
SVM <- c("1","1","1","1","0","-1","-1","0","-1","0") 
BAGGING <- c("0","1","1","1","-1","-1","-1","1","-1","1")
LOGITBOOST <- c("0","1","1","1","0","-1","-1","1","-1","1")
NNETWORK <- c("-1","1","1","1","-1","-1","-1","1","-1","0")
FORESTS <- c("0","1","1","1","1","-1","-1","1","-1","1")
SLDA <- c("0","1","1","1","0","-1","0","1","-1","0")
TREE <- c("1","1","1","1","1","-1","-1","1","-1","0")

results.allm <- data.frame(MAXENTROPY,SVM,BAGGING,
                       LOGITBOOST,NNETWORK,FORESTS,
                       SLDA,TREE)

results.allm

#    MAXENTROPY SVM BAGGING LOGITBOOST NNETWORK FORESTS SLDA TREE
# 1           1   1       0          0       -1       0    0    1
# 2           1   1       1          1        1       1    1    1
# 3           1   1       1          1        1       1    1    1
# 4           1   1       1          1        1       1    1    1
# 5           0   0      -1          0       -1       1    0    1
# 6          -1  -1      -1         -1       -1      -1   -1   -1
# 7          -1  -1      -1         -1       -1      -1    0   -1
# 8           1   0       1          1        1       1    1    1
# 9          -1  -1      -1         -1       -1      -1   -1   -1
# 10          0   0       1          1        0       1    0    0

I want to add a few columns based on the frequency of classes in these rows (rows 1-8):

1st column: assign class only, if all columns show the same class. If not; ""

2nd column: majority vote, assign the class of highest frequency. If two classes have same highest frequency in a row, assign one of them with probability of 0.5.

3rd column: like 2nd row, but if only 0's and either 1's or -1's in a row (like in row #10), assign class 1 or -1

I would really appreciate your help! Thank you!

Here's a way to get your first column using apply :

# Use a list of the classifier names to make sure you're only
# counting their votes
classifier.names <- names(results.allm)

# Apply over each row (MARGIN = 1)
results.allm$consensus <- apply(results.allm[classifier.names],
                                MARGIN = 1,
                                FUN = function(x) {

    # If all elements match the first element...
    ifelse(all(x %in% x[1]),
           yes = x[1], # ... return that element.
           no = "") # Depending on your purpose, NA might be better
    }
)

Here's an approach for your second column - I assume you mean plurality voting instead of majority (ie, they don't have to have more than 50% of the votes, just the most).

results.allm$plurality <- apply(results.allm[classifier.names],
                                MARGIN = 1,
                                FUN = function(x) {

    # Tally up the votes
    xtab <- table(unlist(x))

    # Get the classes with the most votes
    maxclass <- names(xtab)[xtab %in% max(xtab)]

    # Sample from maxclass with equal probability for each tied class
    sample(maxclass, size = 1)

})

And here's a crude attempt for your third column. Basically, I'm checking (within the ifelse ) to see if the row consists entirely of 0 and 1; if it does, I return 1.

If not, I check to see if it consists entirely of 0 and -1; if it does, I return -1.

Otherwise, the function will return the same results as the second approach above.

There's probably a more elegant way to do that, but at least this is relatively straightforward to read. Hope it works for you!

results.allm$third <- apply(results.allm[classifier.names],
                            MARGIN = 1,
                            FUN = function(x) {

    # Tally up the votes
    xtab <- table(unlist(x))

    # If the result sets are (0, 1) or (0, -1), return the non-zero class
    maxclass <- ifelse(all(names(xtab) %in% c("0", "1")),
                       yes = "1",
                       no = ifelse(all(names(xtab) %in% c("0", "-1")),
                                   yes = "-1",
                                   no = names(xtab)[xtab %in% max(xtab)]
        )
    )


    # Sample from maxclass with equal probability for each tied class
    sample(maxclass, size = 1)

})

None of the code above has been checked to see how it behaves in the presence of NA s, so if you have any classifiers that might produce NA s, beware!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM