简体   繁体   中英

NodeJs comparing two arrays

For the record I am a relatively new programmer

I have the code working, but it seems clunky and slow if it were to sort through many items

granted, this node app does not need to be fast, ie the process could take 5 min and it would be fine, but I was curious if there was a better way to do it...

I have this node app which is comparing two data sets... the goals of the program are as follows

  1. compare the csv file to an online api
  2. make sure all of the names in the csv file exist in the array
  3. throw an error (console.log()) message to the screen instead of finishing

right now this is the code

const fs = require("fs");
const csv = require("csv-parser");
const fetch = require("node-fetch");

const results = [];

fs.createReadStream("./customers.csv")
  .pipe(csv())
  .on("data", (data) => {
    results.push(data);
  })
  .on("end", () => {
    console.log("Getting Customer Data from Waze...");
    fetch("https://gql.waveapps.com/graphql/public", {
      method: "post",
      headers: {
        //prettier-ignore
        'Authorization': "Bearer MyAuth",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        query: `
          query {
            business(id: "MyBusinessId") {
              customers {
                edges {
                  node {
                    id
                    name
                  }
                }
              }
            }
          }
        `,
      }),
    })
      .then((res) => res.json())
      .then(({ data }) => {
        console.log("Filtering Data...");
        // this maps through the csv file
        results.map((csv) => {
          let array = [];
          name = "";
          data.business.customers.edges.map((customer) => {
            // push the results of the expression (true of false) to an array
            array.push(
              customer.node.name.toLowerCase() === csv.name.toLowerCase()
            );
            // push nonexistent name (if there is one) variable so error handling is clear
            if (customer.node.name.toLowerCase() !== csv.name.toLowerCase()) {
              name = csv.name;
            }
          });
          // if all elements in array are false, that means there is no matching name in the data.business.customers.edges array and error will be true, if there is a true field in the name, return false
          const error = !array.some((el) => {
            if (el) {
              return true;
            }
          });

          if (error) {
            return console.log(
              `Name: ${name} not found in Waze customer list, please check your spelling`
            );
          }
          // send http request here
        });
        console.log("Finished Sending Invoices");
      });
  });

the customer.csv file

"name","domain","expiration-date"
"bob","yahoo.com","7/2/2020"
"suzie","google.com","12/1/2020"

right now the graphql api returns data that looks like this...

[
  {
    node: {
      id: 'QnVzaW5lc3M6MzE4NmRmNDQtZDg4Zi00MzgxLTk5ZGEtYTQzMWRmYzhmMDk5O0N1c3RvbWVyOjQ3NTg0Mzc2',
      name: 'NOInvoice'
    }
  },
  {
    node: {
      id: 'QnVzaW5lc3M6MzE4NmRmNDQtZDg4Zi00MzgxLTk5ZGEtYTQzMWRmYzhmMDk5O0N1c3RvbWVyOjQ3NTg0MzU3',
      name: 'Suzie'
    }
  },
  {
    node: {
      id: 'QnVzaW5lc3M6MzE4NmRmNDQtZDg4Zi00MzgxLTk5ZGEtYTQzMWRmYzhmMDk5O0N1c3RvbWVyOjQ3NTgwODkx',
      name: 'Bob'
    }
  }
]

any help would be appreciated greatly

Nested maps = O(n*m) time complexity = poor performance

First create a hashmap of the names from the API, then scan the csv array and check each name against the hashmap to see if it exists or not.

Using a hashmap is a common way to improve the performance of nested loops. The result will be closer to O(n+m) time complexity, significantly more performant.

  // create hash of valid names from API
  const validNames = data.business.customers.edges.reduce(
    (names, customer) => { 
      names[customer.name] = customer;   /* or = true */
      return names; 
    }, 
    {}
  );

  // see if any of the names in the csv are not valid
  const err = results.reduce((err, csv) => validNames[csv.name] ? err: ++err, 0);
  if (arr > 0) {
    // have invalid names in CSV
  }

  // OR alternatively, find the invalid entries
  const invalid = results.reduce(
    (invalid, csv) => {
      if (!validNames[csv.name]) invalid.push(csv);
      return invalid;
    },
    []
  );

EDIT

  // OR shorter version of find the invalid entries
  const invalid = results.filter(csv => !validNames[csv.name]);
  if (invalid.length) {
    // have invalid names in CSV
  }

I think you're using a lot of extra variables such as array , name and error that you actually don't need. So this not a performance optimization but an attempt to address the clunkiness of the code. I'm pointing out some changes that you might consider.

results.map((csv) => {
   customers_names = data.business.customers.edges.map((edge) => edge.node.name)
   if(!customers_names.some((name) => name === csv.name)) {
     console.log(`Name: ${csv.name} not found in Waze customer list, please check your spelling`)
   }
})

Instead of:

results.map((csv) => {
  let array = []; <-- (1)
  name = ""; <-- (2)
  data.business.customers.edges.map((customer) => {
    // push the results of the expression (true of false) to an array
    array.push(
      customer.node.name.toLowerCase() === csv.name.toLowerCase()
    );
    // push nonexistent name (if there is one) variable so error handling is clear
    if (customer.node.name.toLowerCase() !== csv.name.toLowerCase()) {
      name = csv.name; <-- (3)
    }
  });
  // if all elements in array are false, that means there is no matching name in the data.business.customers.edges array and error will be true, if there is a true field in the name, return false
  const error = !array.some((el) => {
    if (el) {
      return true;
    }
  }); <-- (4)

  if (error) { <-- (5)
    return console.log(
      `Name: ${name} not found in Waze customer list, please check your spelling`
    );
  }
  // send http request here
});

(1) array keeps the boolean values that determines whether or not the csv.name was found in the data ( GraphQL response). This array will be iterated on (4) . You don't really need two steps by iterating two different arrays, when you can actually find that out with the some function and comparing the names.

In (2) you define a variable name , and in (3) you keep updating that variable over and over again with the same value, which is csv.name (it doesn't change because it doesn't depend on customer at all). So I would remove that variable completely

(5) you only care about csv.name in the logs. So I'm doing exactly that in the shorter version

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM