简体   繁体   English

计算模式或第 2/3/4 个最常用值

[英]Calculating the mode or 2nd/3rd/4th most common value

Surely there has to be a function out there in some package for this?在某些 package 中肯定有一个 function 吗?

I've searched and I've found this function to calculate the mode:我已经搜索过,我发现这个 function 来计算模式:

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

But I'd like a function that lets me easily calculate the 2nd/3rd/4th/nth most common value in a column of data.但我想要一个 function,它可以让我轻松计算一列数据中的第 2/3/4/n 个最常见值。

Ultimately I will apply this function to a large number of dplyr::group_by() s.最终我会将这个 function 应用到大量的dplyr::group_by() s。

Thank you for your help!谢谢您的帮助!

Maybe you could try 也许你可以尝试

f <- function (x) with(rle(sort(x)), values[order(lengths, decreasing = TRUE)])

This gives unique vector values sorted by decreasing frequency. 这给出了按频率递减排序的唯一矢量值。 The first will be the mode, the 2nd will be 2nd most common, etc. 第一个是模式,第二个是最常见的第二个,依此类推。

Another method is to based on table() : 另一种方法是基于table()

g <- function (x) as.numeric(names(sort(table(x), decreasing = TRUE)))

But this is not recommended, as input vector x will be coerced to factor first. 但是不建议这样做,因为输入向量x将被强制首先分解。 If you have a large vector, this is very slow. 如果向量很大,这将非常慢。 Also on exit, we have to extract character names and of the table and coerce it to numeric. 同样在退出时,我们必须提取表的字符名称和并将其强制为数字。


Example

set.seed(0); x <- rpois(100, 10)
f(x)
# [1] 11 12  7  9  8 13 10 14  5 15  6  2  3 16

Let's compare with the contingency table from table : 让我们从联表比较table

tab <- sort(table(x), decreasing = TRUE)
# 11 12  7  9  8 13 10 14  5 15  6  2  3 16 
# 14 14 11 11 10 10  9  7  5  4  2  1  1  1

as.numeric(names(tab))
# [1] 11 12  7  9  8 13 10 14  5 15  6  2  3 16

So the results are the same. 因此结果是相同的。

Here is an R function that I made (inspired by several other SO posts), which may work for your goal (and I use a local dataset on religious affiliation to illustrate it):这是我制作的 R function(受其他几篇 SO 帖子的启发),它可能适用于您的目标(我使用有关宗教信仰的本地数据集来说明它):

It's simple;这很简单; only R base functions are involved: length, match, sort, tabulate, table, unique, which, as.character.仅涉及 R 基本函数:长度、匹配、排序、制表、表格、唯一、其中、as.character。

    Find_Nth_Mode = function(d, N = 2) {
      maxN = function(x, N){
        len = length(x)
        if(N>len){
          warning('N greater than length(x).  Setting N=length(x)')
          N = length(x)
        }
        sort(x,partial=len-N+1)[len-N+1]
      }
      
      (ux = unique(as.character(d)))
      (match(d, ux))
      (a1 = tabulate(match(d, ux)))
      (a2 = maxN(a1, N))
      (a3 = which(a1 == a2))
      (ux[a3])
    }

Sample Output样品 Output

> table(religion_data$relig11)
                   0.None 1.Protestant_Conservative      2.Protestant_Liberal                3.Catholic 
                    34486                      6134                     19678                     36880 
               4.Orthodox             5.Islam_Sunni              6.Islam_Shia                   7.Hindu 
                    20702                     28170                       668                      4653 
               8.Buddhism                  9.Jewish                  10.Other 
                     9983                       381                      6851 
> Find_Nth_Mode(religion_data$relig11, 1)
[1] "3.Catholic"
> Find_Nth_Mode(religion_data$relig11, 2)
[1] "0.None"
> Find_Nth_Mode(religion_data$relig11, 3)
[1] "5.Islam_Sunni"

Reference: I want to express my gratitude to these posts, from which I get the two functions and integrate them into one:参考:我要感谢这些帖子,我从中得到了两个功能并将它们整合为一个:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:确定两个不同数据帧的两个文本字符串之间的第一,第二,第三,第四匹配 - R: Identify 1st, 2nd, 3rd, 4th match between two text strings of two different dataframes 获取每月的第二个和第四个星期六 - Get 2nd and 4th Saturday of Month 返回 R 中具有最高值、第二高值和第三高值的列的名称 - return the name of the column with highest and 2nd highest and 3rd highest value in R Python / R:如果2列在多行中具有相同的值,请在第3列中添加值,然后对第4、5和6列取平均值 - Python/R : If 2 columns have same value in multiple rows, add the values in the 3rd column and average the 4th, 5th and 6th column 如何为R中的每个组选择第2行和第3行 - How to select 2nd and 3rd row for each group in R 观察事件在第二和第三事件上的不同动作 - observe event different actions on 2nd & 3rd event R lpsolve 获得第 2、3、Nth 最佳解决方案 - R lpsolve get 2nd,3rd,Nth best solution Gsub function 用第三个参数而不是第二个替换值 - Gsub function replacing values with 3rd argument not 2nd R,创建由第一列组成的新列,或者如果满足条件,则创建第二列/第三列的值 - R, create new column that consists of 1st column or if condition is met, a value from the 2nd/3rd column 根据与第4列匹配的第3列,将一列中的值替换为另一列 - Replacing values in one column with another based on a 3rd column matching a 4th
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM