简体   繁体   中英

How can I find the frequency of each member in my data set

I am working to fetch and analyze a large data set. I want to know how many each value is appearing in the data set.

Let's give a small example to clarify things.

[
  {"Year": "1997", "Company": "Ford", "Model": "E350", "Length": "2.34"},
  {"Year": "2000", "Company": "Mercury", "Model": "Cougar", "Length": "2.38"}
  {"Year": "2001", "Company": "Ford", "Model": "Cougar", "Length": "2.38"}
]

I don't know exactly what the values that I am having, but I want to hash it to get the results this way.

[
  {"Value": "Ford", "Frequency": 2},
  {"Value": "Mercury", "Frequency": 1},
]

In case it's not dynamic and I know the the values, I will do it this way:

 var filteredCompany = data.filter(function(a) {
                    return /Ford/i.test(a.Company).lenght;
                });

But, I have a very large data set (900 Mbo), I need to make this process in a very dynamic way.

UPDATE

var dataset = {}
d3.csv(link, function(data) {
    dataset = data;
});

//Fetch data 

var frequency = {};
var datasetlength = dataset.length;

  for(var i = 0; i < datasetlength; i++){
    var current = dataset[i];
    if(!frequency.hasOwnProperty(current.company)) frequency[current.company] = 0;
    frequency[current.company]++;
  }

What you can do is loop through all the entries, and gather them into an object where the key is the name and the value is the count. The initial data will look like this:

{
  "Ford" : 2,
  "Mercury" : 1
}

You can do a reduce , passing through an object:

var frequency = hugeData.reduce(function(freq,current){
  var currentCompany = current.Company;
  if(!freq.hasOwnProperty(currentCompany)) freq[currentCompany] = 0;
  freq[currentCompany]++;
  return freq;
},{});

But reduce is ES5 and sometimes slow. You can do a plain loop:

var frequency = {};
var hugeDataLength = hugeData.length;

for(var i = 0; i < hugeDataLength; i++){
  var current = hugeData[i];
  var currentCompany = current.Company;
  if(!frequency.hasOwnProperty(currentCompany)) frequency[currentCompany] = 0;
  frequency[currentCompany]++;
}

Now that we have reduced the data into a much more manageable size, you can loop through the frequency data and turn it into an array, moving down the key and value into an object.

var chartData = Object.keys(frequency).map(function(company){
  var value = frequency[company];
  return {
    Value : company,
    Frequency : value
  }
});

A running demo can be seen here.


I did a similar feat in the past few months, and your browser's debugger is a very handy tool for this job, especially the CPU profiler. You can pin down which operations are actually causing the lag.

I'm not sure if this is the most efficient method of going through that much data (then again, Javascript was not made for big data so efficiency shouldn't be on your mind).

Basically I would approach this looping through all the data with an associative array keeping track of the frequency. If the current data.Company is not in the associative array, it'll add it on to the array as a key and then input the frequency of one. If it is found as a key in the array, it'll increment the frequency by 1.

Example

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM