简体   繁体   English

如何遍历非常大的列表并使用它创建高效的 JSON NODEJS

[英]How To Iterate Through Very Large List and Create Efficient JSON With It NODEJS

I am trying to iterate through a very large list and convert it to json so I can then push it to my developement database to be used more efficiently without having to call this over and over again, the problem is I currently have all of the data on a.txt file in the format as follows:我正在尝试遍历一个非常大的列表并将其转换为 json 以便我可以将其推送到我的开发数据库以更有效地使用而无需一遍又一遍地调用它,问题是我目前拥有所有数据在 a.txt 文件中,格式如下:

table,permaticker,ticker,name,exchange,isdelisted,category,cusips,siccode,sicsector,sicindustry,famasector,famaindustry,sector,industry,scalemarketcap,scalerevenue,relatedtickers,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite
SF1,196290,A,Agilent Technologies Inc,NYSE,N,Domestic Common Stock,00846U101,3826,Manufacturing,Laboratory Analytical Instruments,,Measuring and Control Equipment,Healthcare,Diagnostics & Research,5 - Large,5 - Large,,USD,California; U.S.A,2020-12-18,2014-09-26,1999-11-18,2021-02-18,1997-06-30,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001090872,http://www.agilent.com
SF1,124392,AA,Alcoa Corp,NYSE,N,Domestic Common Stock,013872106,3334,Manufacturing,Primary Production Of Aluminum,,Steel Works Etc,Basic Materials,Aluminum,4 - Mid,5 - Large,,USD,Pennsylvania; U.S.A,2020-10-30,2016-11-01,2016-11-01,2021-02-18,2014-12-31,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001675149,http://www.alcoa.com
SF1,122827,AAAB,Admiralty Bancorp Inc,NASDAQ,Y,Domestic Common Stock,007231103,6022,Finance Insurance And Real Estate,State Commercial Banks,,Banking,Financial Services,Banks - Regional,2 - Micro,1 - Nano,AAABB,USD,Florida; U.S.A,2019-07-29,2017-09-09,1998-09-25,2003-01-28,1997-09-30,2002-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001066808,
SF1,120538,AAAGY,Altana Aktiengesellschaft,NYSE,Y,ADR Common Stock,02143N103,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,4 - Mid,,EUR,Jordan,2019-05-17,2018-02-13,2002-05-22,2010-08-12,2000-12-31,2005-12-31,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001182802,
SF1,155760,AAAP,Advanced Accelerator Applications SA,NASDAQ,Y,ADR Common Stock,00790T100,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,2 - Micro,,EUR,France,2020-10-08,2016-05-19,2015-11-11,2018-02-09,2012-12-31,2017-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001611787,

I currently maybe only one a few things in each line ie the ticker so AA, the name which would be Alcoa Corp, and category and a few others thing.我目前可能在每一行中只有一个东西,即股票代码所以 AA,名称将是 Alcoa Corp,以及类别和其他一些东西。 I currently got it where it parses in and each line is stored into an array as a string of all the data on that line, but Im wanting to actually get it to json format like so:我目前在它解析的地方得到它,并且每一行都作为该行上所有数据的字符串存储到一个数组中,但我想真正将它变成 json 格式,如下所示:

{
 ticker: "AA",
 name: "Alcoa ...",
 category: "Manufacturing"
},
{
 Next Ticker
}

But im really confused on how to do this, i know one method I have tried is using the commans and manually filtering out all of the data so store the data between 2nd and 3rd comma but this isnt going well, my current basic code is:但是我真的很困惑如何做到这一点,我知道我尝试过的一种方法是使用命令并手动过滤掉所有数据,以便将数据存储在第 2 和第 3 个逗号之间,但这并不顺利,我目前的基本代码是:

function convert(file) {
      return new Promise((resolve, reject) => {
        const stream = fs.createReadStream(file);
        // Handle stream error (IE: file not found)
        stream.on("error", reject);

        const reader = readline.createInterface({
          input: stream,
        });

        const array = [];

        reader.on("line", (line) => {
          array.push(JSON.parse(JSON.stringify(line)));
        });

        reader.on("close", () => resolve(array));
      });
    }

and then I simply run that command with the file, any help or guidance would be appreciated!然后我只需使用该文件运行该命令,任何帮助或指导将不胜感激!

EDIT: I just realized you're reading a line at a time...The second code sample demonstrates that.编辑:我刚刚意识到你一次读一行......第二个代码示例演示了这一点。

Start by splitting the string into lines on "\n" Then split each line on a comma Save the first row as headers and the rest as lines fields首先将字符串拆分为"\n"上的行然后将每一行拆分为逗号将第一行保存为标题,将 rest 保存为行字段

For every row:对于每一行:

  1. create a new empty object创建一个新的空 object
  2. iterate over each field and place the key:value pair in the result array遍历每个字段并将键:值对放在结果数组中

result[idx][headers[headerIdx]] - uses the row index for the array index and the header index to provide the key. result[idx][headers[headerIdx]] - 使用行索引作为数组索引,使用 header 索引来提供键。 The assigning the field value to that yields:field值分配给它会产生:

'[{table:"SF1", permaticker:196290,`.... etc. '[{table:"SF1", permaticker:196290,`.... 等。

 let str = `table,permaticker,ticker,name,exchange,isdelisted,category,cusips,siccode,sicsector,sicindustry,famasector,famaindustry,sector,industry,scalemarketcap,scalerevenue,relatedtickers,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite SF1,196290,A,Agilent Technologies Inc,NYSE,N,Domestic Common Stock,00846U101,3826,Manufacturing,Laboratory Analytical Instruments,,Measuring and Control Equipment,Healthcare,Diagnostics & Research,5 - Large,5 - Large,,USD,California; USA,2020-12-18,2014-09-26,1999-11-18,2021-02-18,1997-06-30,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001090872,http://www.agilent.com SF1,124392,AA,Alcoa Corp,NYSE,N,Domestic Common Stock,013872106,3334,Manufacturing,Primary Production Of Aluminum,,Steel Works Etc,Basic Materials,Aluminum,4 - Mid,5 - Large,,USD,Pennsylvania; USA,2020-10-30,2016-11-01,2016-11-01,2021-02-18,2014-12-31,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001675149,http://www.alcoa.com SF1,122827,AAAB,Admiralty Bancorp Inc,NASDAQ,Y,Domestic Common Stock,007231103,6022,Finance Insurance And Real Estate,State Commercial Banks,,Banking,Financial Services,Banks - Regional,2 - Micro,1 - Nano,AAABB,USD,Florida; USA,2019-07-29,2017-09-09,1998-09-25,2003-01-28,1997-09-30,2002-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001066808, SF1,120538,AAAGY,Altana Aktiengesellschaft,NYSE,Y,ADR Common Stock,02143N103,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,4 - Mid,,EUR,Jordan,2019-05-17,2018-02-13,2002-05-22,2010-08-12,2000-12-31,2005-12-31,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001182802, SF1,155760,AAAP,Advanced Accelerator Applications SA,NASDAQ,Y,ADR Common Stock,00790T100,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,2 - Micro,,EUR,France,2020-10-08,2016-05-19,2015-11-11,2018-02-09,2012-12-31,2017-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001611787`; let [headers, ...linesfields] = str.split("\n").map(l => l.split(',')); let result = []; linesfields.map((row, idx) => { result[idx] = {}; row.map((field, headerIdx) => { result[idx][headers[headerIdx]] = field }) }); console.log(result);

Single Line:单线:

On the first line of the file parse the headers as shown.在文件的第一行解析标题,如图所示。

For every other row, parse the line and push the line object into the array of line objects.对于每隔一行,解析行并将line object 推入line对象数组。

 let headerStr = `table,permaticker,ticker,name,exchange,isdelisted,category,cusips,siccode,sicsector,sicindustry,famasector,famaindustry,sector,industry,scalemarketcap,scalerevenue,relatedtickers,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite`; let str = `SF1,196290,A,Agilent Technologies Inc,NYSE,N,Domestic Common Stock,00846U101,3826,Manufacturing,Laboratory Analytical Instruments,,Measuring and Control Equipment,Healthcare,Diagnostics & Research,5 - Large,5 - Large,,USD,California; USA,2020-12-18,2014-09-26,1999-11-18,2021-02-18,1997-06-30,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001090872,http://www.agilent.com`; let headers = headerStr.split(','); // only have to do this on the first line of the file line = {}; str.split(',').map((field, headerIdx) => { line[headers[headerIdx]] = field; }) console.log(line); // would push line object into array of line objects

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM