简体   繁体   中英

Loading in a CSV file as a Map (D3 and JavaScript)

I've looked around JavaScript and D3's documentation, but couldn't find anything that helps me out...

Is it possible to load in a CSV file that looks like so:

header, header
string1, string
string2, string
...
stringN, string

And store into a Map ? Ideally using D3's CSV uploaded?

d3.csv("demoCSVOne.csv", function(errorOne, one) {
    d3.csv("demoCSVTwo.csv", function(errorTwo, two) {

    // do something

    }
}

CSV example

String, Integer
one, 2345
two, 34536
three, 24536

For Mark I'm trying to achieve this calculation - get an average value for that from multiple CSVs that have been selected. Where a, b, c, etc represent the value for a key:

[(a_csv1 + a_csv2 + a_csv3)/3]
[(b_csv1 + b_csv2 + b_csv3)/3]
[(c_csv1 + c_csv2 + c_csv3)/3]

These averages would then need to be stored in a new array, a long with the key that the averages represent. I'm aiming for it to look like this:

key, average
     a, 123
     b, 456
     c, 789

Here's how I would do it. Note, I just used a JavaScript object as my map, instead of an ES6 Map object.

d3.csv('csv1.csv', function(e1, one) {

  d3.csv('csv2.csv', function(e2, two) {

    // our final map
    var aveMap = {};

    // concat the two csv arrays together
    one.concat(two).map((d) => {
      if (!aveMap[d.String]) aveMap[d.String] = {
        values: []
      };
      // build array of values by key
      aveMap[d.String].values.push(+d.Integer);
    });

    // loop and calculate mean
    Object.keys(aveMap).map((k) => {
      aveMap[k].mean = d3.mean(aveMap[k].values);
    });     

  });
});

Produces a final data structure as:

{
  "one": {
    "values": [
      2345,
      2323
    ],
    "mean": 2334
  },
  "two": {
    "values": [
      34536,
      45456
    ],
    "mean": 39996
  },
  "three": {
    "values": [
      24536,
      56567
    ],
    "mean": 40551.5
  }
}

See it running here .

Edits for Comments

Holding the extra values property in memory isn't really making this code slower. If it's not performant, there's two reasons: you have lots of CSV files or they are huge CSV files. For performance, I'd switch to something like this:

var q = d3.queue();
['csv1.csv', 'csv2.csv'].map((c) => {
  q.defer(d3.csv, c);
});

q.awaitAll(function(d, csvs){
    var arr = d3.merge(csvs),
        aveMap = {};

    arr.map((d,i) => {
      if (!aveMap[d.String]) {
        aveMap[d.String] = {
          sum: 0,
          count: 0
        };
      }
      var obj = aveMap[d.String];
      obj.sum += +d.Integer;
      obj.count += 1;

      if ( obj.count === csvs.length ){
       obj.mean = obj.sum / obj.count;
      }
    });

    console.log(aveMap);
});

First, by using d3.queue , you are downloading the csv files concurrently instead of doing them one after the next. Second, you can adjust the input to .defer to only download the files the user actually wants. Third, you'll notice that I'm now calculating the average inside the first loop. If these are large datasets, you want to minimize the looping over them. Fourth, I'm now summing as I go. Of course, this re-factor assumes that each key exists in each csv file once.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM