[英]How To Iterate Through Very Large List and Create Efficient JSON With It NODEJS
我正在尝试遍历一个非常大的列表并将其转换为 json 以便我可以将其推送到我的开发数据库以更有效地使用而无需一遍又一遍地调用它,问题是我目前拥有所有数据在 a.txt 文件中,格式如下:
table,permaticker,ticker,name,exchange,isdelisted,category,cusips,siccode,sicsector,sicindustry,famasector,famaindustry,sector,industry,scalemarketcap,scalerevenue,relatedtickers,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite
SF1,196290,A,Agilent Technologies Inc,NYSE,N,Domestic Common Stock,00846U101,3826,Manufacturing,Laboratory Analytical Instruments,,Measuring and Control Equipment,Healthcare,Diagnostics & Research,5 - Large,5 - Large,,USD,California; U.S.A,2020-12-18,2014-09-26,1999-11-18,2021-02-18,1997-06-30,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001090872,http://www.agilent.com
SF1,124392,AA,Alcoa Corp,NYSE,N,Domestic Common Stock,013872106,3334,Manufacturing,Primary Production Of Aluminum,,Steel Works Etc,Basic Materials,Aluminum,4 - Mid,5 - Large,,USD,Pennsylvania; U.S.A,2020-10-30,2016-11-01,2016-11-01,2021-02-18,2014-12-31,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001675149,http://www.alcoa.com
SF1,122827,AAAB,Admiralty Bancorp Inc,NASDAQ,Y,Domestic Common Stock,007231103,6022,Finance Insurance And Real Estate,State Commercial Banks,,Banking,Financial Services,Banks - Regional,2 - Micro,1 - Nano,AAABB,USD,Florida; U.S.A,2019-07-29,2017-09-09,1998-09-25,2003-01-28,1997-09-30,2002-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001066808,
SF1,120538,AAAGY,Altana Aktiengesellschaft,NYSE,Y,ADR Common Stock,02143N103,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,4 - Mid,,EUR,Jordan,2019-05-17,2018-02-13,2002-05-22,2010-08-12,2000-12-31,2005-12-31,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001182802,
SF1,155760,AAAP,Advanced Accelerator Applications SA,NASDAQ,Y,ADR Common Stock,00790T100,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,2 - Micro,,EUR,France,2020-10-08,2016-05-19,2015-11-11,2018-02-09,2012-12-31,2017-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001611787,
我目前可能在每一行中只有一个东西,即股票代码所以 AA,名称将是 Alcoa Corp,以及类别和其他一些东西。 我目前在它解析的地方得到它,并且每一行都作为该行上所有数据的字符串存储到一个数组中,但我想真正将它变成 json 格式,如下所示:
{
ticker: "AA",
name: "Alcoa ...",
category: "Manufacturing"
},
{
Next Ticker
}
但是我真的很困惑如何做到这一点,我知道我尝试过的一种方法是使用命令并手动过滤掉所有数据,以便将数据存储在第 2 和第 3 个逗号之间,但这并不顺利,我目前的基本代码是:
function convert(file) {
return new Promise((resolve, reject) => {
const stream = fs.createReadStream(file);
// Handle stream error (IE: file not found)
stream.on("error", reject);
const reader = readline.createInterface({
input: stream,
});
const array = [];
reader.on("line", (line) => {
array.push(JSON.parse(JSON.stringify(line)));
});
reader.on("close", () => resolve(array));
});
}
然后我只需使用该文件运行该命令,任何帮助或指导将不胜感激!
编辑:我刚刚意识到你一次读一行......第二个代码示例演示了这一点。
首先将字符串拆分为"\n"
上的行然后将每一行拆分为逗号将第一行保存为标题,将 rest 保存为行字段
对于每一行:
result[idx][headers[headerIdx]]
- 使用行索引作为数组索引,使用 header 索引来提供键。 将field
值分配给它会产生:
'[{table:"SF1", permaticker:196290,`.... 等。
let str = `table,permaticker,ticker,name,exchange,isdelisted,category,cusips,siccode,sicsector,sicindustry,famasector,famaindustry,sector,industry,scalemarketcap,scalerevenue,relatedtickers,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite SF1,196290,A,Agilent Technologies Inc,NYSE,N,Domestic Common Stock,00846U101,3826,Manufacturing,Laboratory Analytical Instruments,,Measuring and Control Equipment,Healthcare,Diagnostics & Research,5 - Large,5 - Large,,USD,California; USA,2020-12-18,2014-09-26,1999-11-18,2021-02-18,1997-06-30,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001090872,http://www.agilent.com SF1,124392,AA,Alcoa Corp,NYSE,N,Domestic Common Stock,013872106,3334,Manufacturing,Primary Production Of Aluminum,,Steel Works Etc,Basic Materials,Aluminum,4 - Mid,5 - Large,,USD,Pennsylvania; USA,2020-10-30,2016-11-01,2016-11-01,2021-02-18,2014-12-31,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001675149,http://www.alcoa.com SF1,122827,AAAB,Admiralty Bancorp Inc,NASDAQ,Y,Domestic Common Stock,007231103,6022,Finance Insurance And Real Estate,State Commercial Banks,,Banking,Financial Services,Banks - Regional,2 - Micro,1 - Nano,AAABB,USD,Florida; USA,2019-07-29,2017-09-09,1998-09-25,2003-01-28,1997-09-30,2002-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001066808, SF1,120538,AAAGY,Altana Aktiengesellschaft,NYSE,Y,ADR Common Stock,02143N103,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,4 - Mid,,EUR,Jordan,2019-05-17,2018-02-13,2002-05-22,2010-08-12,2000-12-31,2005-12-31,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001182802, SF1,155760,AAAP,Advanced Accelerator Applications SA,NASDAQ,Y,ADR Common Stock,00790T100,2834,Manufacturing,Pharmaceutical Preparations,,Pharmaceutical Products,Healthcare,Biotechnology,4 - Mid,2 - Micro,,EUR,France,2020-10-08,2016-05-19,2015-11-11,2018-02-09,2012-12-31,2017-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001611787`; let [headers, ...linesfields] = str.split("\n").map(l => l.split(',')); let result = []; linesfields.map((row, idx) => { result[idx] = {}; row.map((field, headerIdx) => { result[idx][headers[headerIdx]] = field }) }); console.log(result);
单线:
在文件的第一行解析标题,如图所示。
对于每隔一行,解析行并将line
object 推入line
对象数组。
let headerStr = `table,permaticker,ticker,name,exchange,isdelisted,category,cusips,siccode,sicsector,sicindustry,famasector,famaindustry,sector,industry,scalemarketcap,scalerevenue,relatedtickers,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite`; let str = `SF1,196290,A,Agilent Technologies Inc,NYSE,N,Domestic Common Stock,00846U101,3826,Manufacturing,Laboratory Analytical Instruments,,Measuring and Control Equipment,Healthcare,Diagnostics & Research,5 - Large,5 - Large,,USD,California; USA,2020-12-18,2014-09-26,1999-11-18,2021-02-18,1997-06-30,2020-09-30,https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001090872,http://www.agilent.com`; let headers = headerStr.split(','); // only have to do this on the first line of the file line = {}; str.split(',').map((field, headerIdx) => { line[headers[headerIdx]] = field; }) console.log(line); // would push line object into array of line objects
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.