简体   繁体   中英

How To Iterate Through Very Large List and Create Efficient JSON With It NODEJS

I am trying to iterate through a very large list and convert it to json so I can then push it to my developement database to be used more efficiently without having to call this over and over again, the problem is I currently have all of the data on a.txt file in the format as follows:

table,permaticker,ticker,name,exchange,isdelisted,category,cusips,siccode,sicsector,sicindustry,famasector,famaindustry,sector,industry,scalemarketcap,scalerevenue,relatedtickers,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite
SF1,196290,A,Agilent Technologies Inc,NYSE,N,Domestic Common Stock,00846U101,3826,Manufacturing,Laboratory Analytical Instruments,,Measuring and Control Equipment,Healthcare,Diagnostics & Research,5 - Large,5 - Large,,USD,California; U.S.A,2020-12-18,2014-09-26,1999-11-18,2021-02-18,1997-06-30,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001090872,http://www.agilent.com
SF1,124392,AA,Alcoa Corp,NYSE,N,Domestic Common Stock,013872106,3334,Manufacturing,Primary Production Of Aluminum,,Steel Works Etc,Basic Materials,Aluminum,4 - Mid,5 - Large,,USD,Pennsylvania; U.S.A,2020-10-30,2016-11-01,2016-11-01,2021-02-18,2014-12-31,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001675149,http://www.alcoa.com
SF1,122827,AAAB,Admiralty Bancorp Inc,NASDAQ,Y,Domestic Common Stock,007231103,6022,Finance Insurance And Real Estate,State Commercial Banks,,Banking,Financial Services,Banks - Regional,2 - Micro,1 - Nano,AAABB,USD,Florida; U.S.A,2019-07-29,2017-09-09,1998-09-25,2003-01-28,1997-09-30,2002-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001066808,
SF1,120538,AAAGY,Altana Aktiengesellschaft,NYSE,Y,ADR Common Stock,02143N103,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,4 - Mid,,EUR,Jordan,2019-05-17,2018-02-13,2002-05-22,2010-08-12,2000-12-31,2005-12-31,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001182802,
SF1,155760,AAAP,Advanced Accelerator Applications SA,NASDAQ,Y,ADR Common Stock,00790T100,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,2 - Micro,,EUR,France,2020-10-08,2016-05-19,2015-11-11,2018-02-09,2012-12-31,2017-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001611787,

I currently maybe only one a few things in each line ie the ticker so AA, the name which would be Alcoa Corp, and category and a few others thing. I currently got it where it parses in and each line is stored into an array as a string of all the data on that line, but Im wanting to actually get it to json format like so:

{
 ticker: "AA",
 name: "Alcoa ...",
 category: "Manufacturing"
},
{
 Next Ticker
}

But im really confused on how to do this, i know one method I have tried is using the commans and manually filtering out all of the data so store the data between 2nd and 3rd comma but this isnt going well, my current basic code is:

function convert(file) {
      return new Promise((resolve, reject) => {
        const stream = fs.createReadStream(file);
        // Handle stream error (IE: file not found)
        stream.on("error", reject);

        const reader = readline.createInterface({
          input: stream,
        });

        const array = [];

        reader.on("line", (line) => {
          array.push(JSON.parse(JSON.stringify(line)));
        });

        reader.on("close", () => resolve(array));
      });
    }

and then I simply run that command with the file, any help or guidance would be appreciated!

EDIT: I just realized you're reading a line at a time...The second code sample demonstrates that.

Start by splitting the string into lines on "\n" Then split each line on a comma Save the first row as headers and the rest as lines fields

For every row:

  1. create a new empty object
  2. iterate over each field and place the key:value pair in the result array

result[idx][headers[headerIdx]] - uses the row index for the array index and the header index to provide the key. The assigning the field value to that yields:

'[{table:"SF1", permaticker:196290,`.... etc.

 let str = `table,permaticker,ticker,name,exchange,isdelisted,category,cusips,siccode,sicsector,sicindustry,famasector,famaindustry,sector,industry,scalemarketcap,scalerevenue,relatedtickers,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite SF1,196290,A,Agilent Technologies Inc,NYSE,N,Domestic Common Stock,00846U101,3826,Manufacturing,Laboratory Analytical Instruments,,Measuring and Control Equipment,Healthcare,Diagnostics & Research,5 - Large,5 - Large,,USD,California; USA,2020-12-18,2014-09-26,1999-11-18,2021-02-18,1997-06-30,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001090872,http://www.agilent.com SF1,124392,AA,Alcoa Corp,NYSE,N,Domestic Common Stock,013872106,3334,Manufacturing,Primary Production Of Aluminum,,Steel Works Etc,Basic Materials,Aluminum,4 - Mid,5 - Large,,USD,Pennsylvania; USA,2020-10-30,2016-11-01,2016-11-01,2021-02-18,2014-12-31,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001675149,http://www.alcoa.com SF1,122827,AAAB,Admiralty Bancorp Inc,NASDAQ,Y,Domestic Common Stock,007231103,6022,Finance Insurance And Real Estate,State Commercial Banks,,Banking,Financial Services,Banks - Regional,2 - Micro,1 - Nano,AAABB,USD,Florida; USA,2019-07-29,2017-09-09,1998-09-25,2003-01-28,1997-09-30,2002-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001066808, SF1,120538,AAAGY,Altana Aktiengesellschaft,NYSE,Y,ADR Common Stock,02143N103,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,4 - Mid,,EUR,Jordan,2019-05-17,2018-02-13,2002-05-22,2010-08-12,2000-12-31,2005-12-31,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001182802, SF1,155760,AAAP,Advanced Accelerator Applications SA,NASDAQ,Y,ADR Common Stock,00790T100,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,2 - Micro,,EUR,France,2020-10-08,2016-05-19,2015-11-11,2018-02-09,2012-12-31,2017-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001611787`; let [headers, ...linesfields] = str.split("\n").map(l => l.split(',')); let result = []; linesfields.map((row, idx) => { result[idx] = {}; row.map((field, headerIdx) => { result[idx][headers[headerIdx]] = field }) }); console.log(result);

Single Line:

On the first line of the file parse the headers as shown.

For every other row, parse the line and push the line object into the array of line objects.

 let headerStr = `table,permaticker,ticker,name,exchange,isdelisted,category,cusips,siccode,sicsector,sicindustry,famasector,famaindustry,sector,industry,scalemarketcap,scalerevenue,relatedtickers,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite`; let str = `SF1,196290,A,Agilent Technologies Inc,NYSE,N,Domestic Common Stock,00846U101,3826,Manufacturing,Laboratory Analytical Instruments,,Measuring and Control Equipment,Healthcare,Diagnostics & Research,5 - Large,5 - Large,,USD,California; USA,2020-12-18,2014-09-26,1999-11-18,2021-02-18,1997-06-30,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001090872,http://www.agilent.com`; let headers = headerStr.split(','); // only have to do this on the first line of the file line = {}; str.split(',').map((field, headerIdx) => { line[headers[headerIdx]] = field; }) console.log(line); // would push line object into array of line objects

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM