简体   繁体   English

PERCENTILE IF 对一组条件使用 ARRAYFORMULA

[英]PERCENTILE IF using ARRAYFORMULA for a set of conditions

I need to calculate the percentile using an if condition to calculate it by group of conditions, but Google Sheets doesn't provide PERCENTILEIF function.我需要使用 if 条件计算百分位数,以按条件组计算,但 Google 表格不提供 PERCENTILEIF 函数。 A nonarray solution is possible:非数组解决方案是可能的:

=ARRAYFORMULA(PERCENTILE(if(range=value,values),percentile))

but in my case value should be an array of possible values.但在我的情况下, value应该是一个可能值的数组。

Here is the sample data with the expected result highlighted:以下是突出显示预期结果的示例数据: 样本

I tried several options to use an array of possible values, but in all cases, I get the wrong result:我尝试了几个选项来使用一组可能的值,但在所有情况下,我都得到了错误的结果:

Using JOIN in G2 :G2中使用JOIN

=arrayformula(if(len(E2:E3),percentile(split(regexreplace(join(",",
   Arrayformula(A2:A12 & "_" & B2:B12)),E2:E3  & "_(\d+)|.",",$1"),","),D2),))

Using MATCH in H2 :H2中使用MATCH

=ARRAYFORMULA(if(len(E2:E3),
   PERCENTILE(IFNA(--(match(A2:A12,E2:E3,0) > 0) * B2:B12,),D2),))

here is the Spreadsheet file: https://docs.google.com/spreadsheets/d/1VDJIYvmOC46DI_9u4zSEfmxSan5R5VKK772C_kP5rxA/edit?usp=sharing这是电子表格文件: https ://docs.google.com/spreadsheets/d/1VDJIYvmOC46DI_9u4zSEfmxSan5R5VKK772C_kP5rxA/edit?usp=sharing

Just as an exercise I tried working it out from first principles based on the quantiles formula .就像一个练习一样,我尝试根据分位数公式从第一原理开始计算它。 The Excel or Google Sheets Percentile and Percentile.inc functions use the (N − 1)p + 1 variation shown in the last table under Excel in the reference above. Excel 或 Google 表格 Percentile 和 Percentile.inc 函数使用上述参考中 Excel 下最后一个表中显示的 (N − 1)p + 1 变体。

So for the first group,所以对于第一组,

(N − 1)p + 1 = 3 * 0.8 + 1 = 3.4

This means you interpolate 0.4 of the way from the third point (10) to the fourth point (30), giving you这意味着您从第三点 (10) 到第四点 (30) 内插 0.4,给您

10 + 0.4 * (30 - 10) = 18.

The array formula is数组公式为

=ArrayFormula(vlookup(vlookup(E2:E3,{sort(A2:B,1,1,2,1),sequence(ROWS(A2:A))},3,false)+floor((countif(A2:A,E2:E3)-1)*D2),{sequence(ROWS(A2:A)),sort(A2:B,1,1,2,1)},3,false)
+(vlookup(vlookup(E2:E3,{sort(A2:B,1,1,2,1),sequence(ROWS(A2:A))},3,false)+ceiling((countif(A2:A,E2:E3)-1)*D2),{sequence(ROWS(A2:A)),sort(A2:B,1,1,2,1)},3,false)
-vlookup(vlookup(E2:E3,{sort(A2:B,1,1,2,1),sequence(ROWS(A2:A))},3,false)+floor((countif(A2:A,E2:E3)-1)*D2),{sequence(ROWS(A2:A)),sort(A2:B,1,1,2,1)},3,false))*mod((countif(A2:A,E2:E3)-1)*D2,1))

在此处输入图像描述


I believe you can also do it by manipulating the values of the second argument to the Percentile function - it would go like this:我相信您也可以通过操作 Percentile 函数的第二个参数的值来做到这一点 - 它会像这样:

=ArrayFormula(percentile(if(A2:A="",,B2:B+A2:A*1000),
D2*(countif(A2:A,E2:E3)-1)/(count(A2:A)-1)+(countif(A2:A,"<"&E2:E3))/(count(A2:A)-1))-E2:E3*1000)

在此处输入图像描述

Explanation解释

I think I can best show the logic by means of a graph:我想我可以通过图表来最好地展示逻辑:

在此处输入图像描述

So I've added a constant (50 to make it easier to see on the graph to the second group and 100 to the third group) to separate the three groups.所以我添加了一个常数(50 以便在图表上更容易看到第二组和 100 到第三组)来分隔三组。 I've also sorted within each group to make it easier to visualise but this isn't necessary in the formula because Percentile will do the sorting.我还在每个组中进行了排序以使其更易于可视化,但这在公式中不是必需的,因为百分位数会进行排序。

If you look at the third group, you can land exactly at the beginning of this group by choosing to go to the 60th percentile in the whole of the data.如果您查看第三组,您可以通过选择进入整个数据的第 60 个百分位,准确地落在该组的开头。 Then you can go to the 80th percentile of these last five points by adding in the required percentile times the distance between the first and last point in this group as a fraction of the distance between the first and last point in the whole of the data.然后,您可以通过将所需的百分位数乘以该组中第一个点和最后一个点之间的距离作为整个数据中第一个点和最后一个点之间的距离的一小部分来达到最后五个点的第 80 个百分位。

There's nothing magic about choosing 1000 in the formula above, just a big enough number to separate the groups - max(B2:B) would be safest if they are all positive numbers.在上面的公式中选择 1000 并没有什么神奇之处,只是一个足够大的数字来分隔组 - 如果它们都是正数,max(B2:B) 将是最安全的。

You can get the percentile for each value as follows您可以获得每个值的百分位数,如下所示

=sort(arrayformula(iferror(
{A2:A,B2:B,
(VLOOKUP(row(A2:A),{sort({row(A2:A),A2:B},2,1,3,1),row(A2:A)},4,false)-MATCH(A2:A,QUERY({sort({A2:B},1,1,2,0)},"select Col1"),0))/countif(A2:A,A2:A)}
)),1,1,3,1)

then apply (or not) an interpolation, it is up to you to do it by linear formula as Tom Sharpe did or according to a statistical distribution ( https://statisticsbyjim.com/basics/percentiles/ )然后应用(或不应用)插值,您可以像 Tom Sharpe 那样通过线性公式或根据统计分布( https://statisticsbyjim.com/basics/percentiles/

note that percentile 80% of idx 3 is obviously 20 since there is only 5 values!请注意,idx 3 的百分位 80% 显然是 20,因为只有 5 个值! excel as google sheets made a mistake on that excel 因为谷歌表格在这方面犯了一个错误

在此处输入图像描述

Add the apps script option in case other community members are interested in this solution.如果其他社区成员对此解决方案感兴趣,请添加应用程序脚本选项。 I consider @TomSharpe is the best approach but in some cases instead of a large formula it maybe suitable a short one using a custom function percentileIf .我认为@TomSharpe 是最好的方法,但在某些情况下,它可能适合使用自定义函数percentileIf的短公式,而不是大公式。 It is included as a script in the sample file provided in the question and it includes the unit testing.它作为脚本包含在问题中提供的示例文件中,并且包含单元测试。

/**
 * Google Spreadsheet doesn´t offer percentileIf function. Here javascript solution, that works using Arrayformula
 * 
 * @param range {Array} Array of values to test the criterium. If the input is Spreadsheet range it will be a 2D-Array
 * @param criterium {Array} The criterium to match each element of range. It can be a single value
 *  If the input is Spreadsheet range it will be a 2D-Array
 * @values {Array} The set of value to calculate the percentile based on criterium
 * @param percentileValue {Number} The percentile to be calculated. It whould be a number in the range of [0,1], it accepts 0 and 1 as 
 *  a possible value
 * @return {Array} The percentile for each element of range that matches the criterium, if criterium ia single value, then it returns a single value
 * 
 */
function percentileIf(range, criterium, values, percentileValue) {

  /* Standardize comparision process for considering Numbers, Dates (excluding timestamp) and String, if String has a date representation it tries to 
  parse it to a number*/
  function cmp(a,b) {
      let result = false, aa,bb;
      if((typeof a) === (typeof b)) {
        if (("string" === typeof a) && ("string" === typeof b)) {// Trying to identify a possible date in string format
          aa = Date.parse(a);
          bb = Date.parse(b);
          if (aa && bb){ // Trying to identify a date
            a = aa;
            b = bb;
          }
        }
        if((a instanceof Date) && (b instanceof Date)) {// Comparing only dates, not considering timestamp
          a.setHours(0, 0, 0, 0);
          b.setHours(0, 0, 0, 0);
          result = (a - b) == 0;
        } else {
          result = a === b;
        }
      }
    return result;
  }

  function arraySortNumbers(inputarray) {
    return inputarray.sort(function (a, b) {
      return a - b;
    });
  }
  
  // Idea taken from here: https://stackoverflow.com/questions/48719873/how-to-get-median-and-quartiles-percentiles-of-an-array-in-javascript-or-php
  function percentileCalc(data, q) {
    data = arraySortNumbers(data);
    var pos = ((data.length) - 1) * q;
    var base = Math.floor(pos);
    var rest = pos - base;
    if ((data[base + 1] !== undefined)) {
      return data[base] + rest * (data[base + 1] - data[base]);
    } else {
      return data[base];
    }
  }

  let result = null;
  let validValues = [];
  // Checking preconditions
  if (!Array.isArray(range)) throw new Error("range input argument should be an array");
  if (!Array.isArray(values)) throw new Error("values input argument should be an array");
  if(percentileValue < 0 || percentileValue > 1) throw new Error("The percentile value should be a number between 0-1");

  if (Array.isArray(criterium)) {// Recursive invocation in case of more than one criterium
    result = [];
    criterium = criterium.filter(function(e){ return e !="" }); // removing empty elements (to optimize the function)
    criterium.forEach(item => {
      result.push(percentileIf(range, item, values, percentileValue));
    });
  } else {
    let array = range, numbers = values;
    if(Array.isArray(range[0])) array = range.map(x => x[0]); // Converting to a colum-array
    if(Array.isArray(values[0])) numbers = values.map(x => x[0]);
    array = array.filter(function(e){ return e !="" }); // removing empty elements (to optimize the function)
    numbers = numbers.filter(function(e){ return e !="" }); // removing empty elements (to optimize the function)
    if(array.length != numbers.length) throw new Error("range and values input arguments should have the same size");
    for (let i = 0; i < array.length; i++) {
      if(cmp(criterium, array[i])) {
        validValues.push(numbers[i]);
      } 
    }
    result = percentileCalc(validValues, percentileValue);
  }
  return result;
}

Here how to use the function created in the Spreadsheet:这里如何使用电子表格中创建的函数: 在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM