简体   繁体   中英

Finding highest amount of repeating value in data frame in R

Beginning R programmer here.

I have a data frame called 'narc' that has recorded answers to 40 different questions measuring narcissism.

It looks like this:

 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16
1723  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0
7231  2  2  2  1  1  2  1  1  2   2   2   2   2   2   1   2
5556  2  2  2  1  2  2  2  1  1   2   2   1   2   2   1   1
1511  2  2  2  2  2  2  2  2  2   2   2   2   2   2   2   2
2080  1  1  2  2  1  1  2  2  2   1   1   2   1   2   1   1
1074  2  2  1  1  2  2  2  1  1   1   1   1   2   2   1   2
     Q17 Q18 Q19 Q20 Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30
1723   0   0   0   0   0   0   0   0   0   0   0   0   0   0
7231   1   2   2   1   2   1   1   2   2   1   1   1   2   2
5556   1   1   1   1   1   2   1   2   2   1   2   1   2   1
1511   2   2   2   2   2   2   2   2   2   2   2   2   2   2
2080   1   1   1   1   2   1   2   1   1   1   2   1   1   1
1074   2   1   1   1   1   1   2   2   2   1   1   1   2   2
     Q31 Q32 Q33 Q34 Q35 Q36 Q37 Q38 Q39 Q40 elapse gender age
1723   0   0   0   0   0   0   0   0   0   0      8      1  23
7231   2   1   1   1   1   2   2   2   2   1     24      1  21
5556   2   1   1   2   1   1   2   2   2   1     33      2  18
1511   2   2   2   2   2   2   2   2   2   2     51      1  16
2080   2   2   1   1   2   1   1   2   2   2     59      1  20
1074   1   1   1   1   1   2   2   1   2   1     60      1  24
     score      level
1723     0        not
7231     8        not
5556    11        not
1511    17     mildly
2080    21 moderately
1074    14        not

I also have a data frame called 'narc.key' which has the answers that correspond to narcissism and it looks like this:

  Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17
1  1  1  1  2  2  1  2  1  2   2   1   1   1   1   2   1   2
  Q18 Q19 Q20 Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30 Q31 Q32
1   2   2   2   1   2   2   1   1   2   1   2   1   1   1   2
  Q33 Q34 Q35 Q36 Q37 Q38 Q39 Q40
1   1   1   2   1   1   1   1   2

I want to find out which question had the highest number of narcissistic answers.

My approach to this problem would be to create a vector of values to record the number of rows that matched with narc.key for each column. However, I'm having some trouble as to how to do this. Here is my code so far:

  for (i in 1:nrow(narc)){

  }
    for(x in 1:40){
      highest.score<-narc[i]
      for(y in 1:40)
        if(narc[i,y]==narc.key[1,y]){

I'm having a hard time wrapping my head around what to do next. Please help?

This isn't fancy, but you can create an answer data set the same size as the questions and just use == (provided that your data sets are not really huge, this should not be too slow)

qs <- read.table(header = TRUE, text="Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16
  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0
  2  2  2  1  1  2  1  1  2   2   2   2   2   2   1   2
  2  2  2  1  2  2  2  1  1   2   2   1   2   2   1   1
  2  2  2  2  2  2  2  2  2   2   2   2   2   2   2   2
  1  1  2  2  1  1  2  2  2   1   1   2   1   2   1   1
  2  2  1  1  2  2  2  1  1   1   1   1   2   2   1   2")

ans <- read.table(header = TRUE, text="  Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16
1  1  1  1  2  2  1  2  1  2   2   1   1   1   1   2   1")


ans[1:nrow(qs), ] <- ans[1, ]

#   Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16
# 1  1  1  1  2  2  1  2  1  2   2   1   1   1   1   2   1
# 2  1  1  1  2  2  1  2  1  2   2   1   1   1   1   2   1
# 3  1  1  1  2  2  1  2  1  2   2   1   1   1   1   2   1
# 4  1  1  1  2  2  1  2  1  2   2   1   1   1   1   2   1
# 5  1  1  1  2  2  1  2  1  2   2   1   1   1   1   2   1
# 6  1  1  1  2  2  1  2  1  2   2   1   1   1   1   2   1

And then sum the columns:

colSums(qs == ans)

# Q1  Q2  Q3  Q4  Q5  Q6  Q7  Q8  Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 
# 1   1   1   2   3   1   4   3   3   3   2   2   1   0   1   2 

EDIT: I misread the need. This answer indicates which questions had, as their most common response, a narc answer. NOT which question had the highest number of narc responses.

Here's one way. The library is so I can use pipes to make it easier to follow. I have a matrix with 40 columns, 10 rows - all randomly chosen to be 0, 1 or 2. I use the apply function, across columns, and tabulate the responses. Then, I apply to each of the elements of the list (there are 40, one per column) the function, which.max which will return the index of the highest value in the table. Then, I apply to each of these list elements (again, 40) and ask for the names of the of the value returned by which.max which will give me the response with the highest count. Finally, I unlist and turn them in to integers (otherwise they're text). This can be compared against the narc key to see which answers match.

library(magrittr)
mat = matrix(sample(c(0,1,2),400,replace=T),ncol=40)
output = mat %>%
  apply(2,table) %>%
  lapply(which.max) %>%
  lapply(names) %>%
  unlist() %>%
  as.integer()
narc.key == output

The one unintended outcome of this is that if there's a tie for most common, you may get back the first answer, not both.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM