简体   繁体   English

使用d3将累积百分比线拟合到排序后的直方图,以获取帕累托直方图

[英]Fit the cumulative percentage line to the sorted histogram output with d3 for a pareto chart histogram

This is what I have so far: https://gist.github.com/daluu/fc1cbcab68852ed3c5fa and http://bl.ocks.org/daluu/fc1cbcab68852ed3c5fa . 到目前为止,这就是我所拥有的: https : //gist.github.com/daluu/fc1cbcab68852ed3c5fahttp://bl.ocks.org/daluu/fc1cbcab68852ed3c5fa I'm trying to replicate Excel functionality. 我正在尝试复制Excel功能。

The line fits the default histogram just fine as in the base/original http://bl.ocks.org/daluu/f58884c24ff893186416 . 该行适合默认直方图,就像在基本/原始http://bl.ocks.org/daluu/f58884c24ff893186416中一样 And I'm able to sort the histogram in descending frequency, although in doing so, I switched x scales (from linear to ordinal). 而且我能够以降序对直方图进行排序,尽管这样做是将x比例切换(从线性到有序)。 I can't seem to map the line to the sorted histogram correctly at this point. 我现在似乎无法正确地将线映射到排序后的直方图。 It should look like the following examples in terms of visual representation: 就视觉表示而言,它应类似于以下示例:

  • the Excel screenshot in a comment in my gist referenced above 上面引用的要点注释中的Excel屏幕截图
  • the pareto chart sorted histogram in this SO post 该SO帖子中的pareto图表排序的直方图
  • the pareto chart (similar to but not exactly a sorted histogram) made with d3 here 此处使用d3生成的pareto图表(类似于但不完全是排序的直方图)

What's the best design approach to get the remaining part working? 使其余部分正常工作的最佳设计方法是什么? Should I have started with a single x scale and not need to switch from linear to ordinal? 我是否应该从单一的x刻度开始,而不必从线性转换为有序? If so, I'm not sure how to apply the histogram layout correctly using an ordinal scale or how not to use a linear x scale as a source of input to the histogram layout and still get the desired output. 如果是这样,我不确定如何使用序数比例尺正确应用直方图布局,或者如何不使用线性x比例尺作为直方图布局的输入源并仍然获得所需的输出。

Using the same ordinal scale with the code I have so far, the line looks ok but it's not the curve I am expecting to see. 到目前为止,我在代码中使用了相同的序数标度,这条线看起来还可以,但是那并不是我期望看到的曲线。

Any help appreciated. 任何帮助表示赞赏。

The main issue with the line is that the cumulative distribution needs to be recalculated after the bar is sorted, or if you're gunning for a static pareto chart, the cumulative distribution needs to be calculated in the target sort order. 该行的主要问题是,在对条形进行排序后,需要重新计算累积分布,或者如果您想获取静态pareto图表,则需要按照目标排序顺序来计算累积分布。 For this purpose i've created a small function to do this calculation: 为此,我创建了一个小函数来进行此计算:

function calcCDF(data){
  data.forEach(function(d,i){
      if(i === 0){
      d.cum = d.y/dataset.length
    }else{
      d.cum = (d.y/dataset.length) + data[i-1].cum
    }
  })
  return data
}

In my case, i'm toggling the pareto sort on/off and recalculating the d.cum property each time. 就我而言,我每次切换pareto排序并重新计算d.cum属性。 One could theoretically create two cumulative dist properties to start with; 从理论上讲,一个可以创建两个累积的dist属性。 iedcum for a regular ordered distribution and say d.ParetoCum for the sorted cumulative, but i'm using d.cum on a tooltip and decided against that. iedcum用于常规的有序分布,对已排序的累积值说d.ParetoCum,但是我在工具提示上使用d.cum并决定反对。

Per the axis, i'm using a single ordinal scale which i think is cleaner, but required some work on getting the labels to be meaningful for number ranges since tick-marks and labels no longer delineate the bins as one would get with a linear scale. 对于每个轴,我使用的是单个序数刻度,我认为这更干净,但需要做一些工作才能使标签对于数字范围有意义,因为刻度线和标签不再像线性容器那样划定垃圾箱。规模。 My solution here was to just use the number range as the tick mark eg "1 - 1.99" and add a function to alternate tickmarks (got that solution a while ago from Alternating tick padding in d3.js ). 我在这里的解决方案是仅将数字范围用作刻度线,例如“ 1-1.99”,并向备用刻度线添加一个函数(前一阵子从d3.js中的交替刻度填充中得到该解决方案)。

For the bar sorting, i'm using this d3 example as a reference in case you need to understand in the context of a simpler/smaller example. 对于条形排序,我将使用此d3示例作为参考,以防您需要在一个更简单/更小的示例的上下文中进行理解。

See this fiddle that incorporates all of the above. 看到结合了以上所有内容的小提琴 If you want to use it, i would suggest adding a check to avoid the user being able to toggle off both bars and line (left a note in the code...should be trivial) 如果您想使用它,我建议您添加一个检查,以避免用户能够同时关闭横条和横条(在代码中留下一个注释……应该是微不足道的)

Instead of sorting the y. 而不是对y进行排序。

data.sort(function(a,b){ return b.y - a.y;});

you should be sorting the x 您应该对x进行排序

data.sort(function(a,b){ return a.x - b.x;});

Working code here 这里的工作代码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM