简体   繁体   English

如何使用Crossfilter和DC.js处理丢失的数据?

[英]How to deal with missing data with crossfilter and DC.js?

Let the following dataset: 让下面的数据集:

var dataset = [
    {
        "user": "u1",
        "question1": "answer1",
        "question2": "answer2",
        ...
    },
    ...
];

Assume that this dataset is not complete: some users might have answered one question, but not another. 假定此数据集不完整:某些用户可能回答了一个问题,但未回答另一个问题。 Thus this dataset has some blanks where a value "questionX" does not appear. 因此,该数据集具有一些空白,其中未出现值“ questionX”。

Assume that we create for each question, the associated pie chart like this: 假设我们为每个问题创建关联的饼图,如下所示:

var questions = ["question1", "question2", ...];
var cf = crossfilter(dataset);

for (var i = 0; i < questions.length; i++) {

    var questionDim = cf.dimension(function(d) { return d[questions[i]]});
    var questionGrp = questionDim.group().reduceCount();

    plotPieChart("#dc-" + questions[i], questionDim, questionGrp); // helper function to plot standard DC pie chart based on a dimension and group.
}

It seems that the group does not correctly handle the missing values, and still categorizes the missing points into the first possible category. 似乎该组不能正确处理缺失值,并且仍将缺失点归类为第一个可能的类别。

  1. Is it a bug? 是虫子吗?

  2. If not, one possible solution is to preprocess the dataset and add the missing questions with a dummy answer (for instance 'NA'). 如果不是,一种可能的解决方案是对数据集进行预处理,并使用虚拟答案(例如“ NA”)添加缺少的问题。 However, by doing so, the 'NA' answer will appear as a pie. 但是,这样做,“ NA”答案将显示为馅饼。 Thus how can you remove this dummy pie from the displayed results? 因此,如何从显示的结果中删除该虚拟馅饼?

  3. is there a better way to deal with such issue? 有没有更好的方法来处理此类问题?

Thanks! 谢谢!

No, it's not a bug. 不,这不是错误。 Crossfilter dimensions must be naturally ordered or weird stuff happens. 交叉过滤器的尺寸必须自然排序,否则会发生奇怪的事情。

You should define your dimensions to handle undefined values. 您应该定义尺寸以处理未定义的值。 You can do this like so: 您可以这样做:

var questionDim = cf.dimension(function(d) { return d[questions[i]] ? d[questions[i]] : "No answer"});

Then when you define your dc.js chart, you can filter out data you don't want, if you don't want the non-answers to display in your pie chart (though you probably should display them so that viewers understand the proportion of people who answered said question): 然后,当您定义dc.js图表​​时,如果您不希望非答案显示在饼图中,则可以过滤掉不需要的数据(尽管您可能应该显示它们,以便观众理解比例)回答上述问题的人):

dc.pieChart('#pie-chart')
  .group(questionGrp)
  .data(function(group) {
     return group.all()
                 .filter(function(d) { return d.key !== "No answer"; }); 
  })

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM