[英]How to deal with missing data with crossfilter and DC.js?
Let the following dataset: 让下面的数据集:
var dataset = [
{
"user": "u1",
"question1": "answer1",
"question2": "answer2",
...
},
...
];
Assume that this dataset is not complete: some users might have answered one question, but not another. 假定此数据集不完整:某些用户可能回答了一个问题,但未回答另一个问题。 Thus this dataset has some blanks where a value "questionX" does not appear.
因此,该数据集具有一些空白,其中未出现值“ questionX”。
Assume that we create for each question, the associated pie chart like this: 假设我们为每个问题创建关联的饼图,如下所示:
var questions = ["question1", "question2", ...];
var cf = crossfilter(dataset);
for (var i = 0; i < questions.length; i++) {
var questionDim = cf.dimension(function(d) { return d[questions[i]]});
var questionGrp = questionDim.group().reduceCount();
plotPieChart("#dc-" + questions[i], questionDim, questionGrp); // helper function to plot standard DC pie chart based on a dimension and group.
}
It seems that the group does not correctly handle the missing values, and still categorizes the missing points into the first possible category. 似乎该组不能正确处理缺失值,并且仍将缺失点归类为第一个可能的类别。
Is it a bug? 是虫子吗?
If not, one possible solution is to preprocess the dataset and add the missing questions with a dummy answer (for instance 'NA'). 如果不是,一种可能的解决方案是对数据集进行预处理,并使用虚拟答案(例如“ NA”)添加缺少的问题。 However, by doing so, the 'NA' answer will appear as a pie.
但是,这样做,“ NA”答案将显示为馅饼。 Thus how can you remove this dummy pie from the displayed results?
因此,如何从显示的结果中删除该虚拟馅饼?
is there a better way to deal with such issue? 有没有更好的方法来处理此类问题?
Thanks! 谢谢!
No, it's not a bug. 不,这不是错误。 Crossfilter dimensions must be naturally ordered or weird stuff happens.
交叉过滤器的尺寸必须自然排序,否则会发生奇怪的事情。
You should define your dimensions to handle undefined values. 您应该定义尺寸以处理未定义的值。 You can do this like so:
您可以这样做:
var questionDim = cf.dimension(function(d) { return d[questions[i]] ? d[questions[i]] : "No answer"});
Then when you define your dc.js chart, you can filter out data you don't want, if you don't want the non-answers to display in your pie chart (though you probably should display them so that viewers understand the proportion of people who answered said question): 然后,当您定义dc.js图表时,如果您不希望非答案显示在饼图中,则可以过滤掉不需要的数据(尽管您可能应该显示它们,以便观众理解比例)回答上述问题的人):
dc.pieChart('#pie-chart')
.group(questionGrp)
.data(function(group) {
return group.all()
.filter(function(d) { return d.key !== "No answer"; });
})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.