简体   繁体   English

JavaScript-将空格分隔的文本文件读入数组并用作查找表

[英]JavaScript - reading a whitespace delimited text file into array and use as lookup table

First off: I'm an absolute beginner in JavaScript and started 2 weeks ago to learn many hours a day. 首先,我是JavaScript的绝对入门者,从2周前开始每天学习很多小时。 I am running a node.JS server on GNU/Linux and I tried a lot of variations to achieve the goal. 我正在GNU / Linux上运行一个node.JS服务器,并且我尝试了许多变体来实现这一目标。 Unfortunately I stuck and don't know how to continue. 不幸的是我卡住了,不知道如何继续。

I have a text file with white-spaces and line feeds and the file contains something about > 2000 lines. 我有一个带有空格和换行符的文本文件,并且该文件包含有关> 2000行的内容。 I want to read this text file into my javascript program so I can use later as a lookup table. 我想将此文本文件读入我的javascript程序,以便以后用作查找表。 I am not sure if I need to JSON stringify it for later use, maybe it's simple to leave it as an object/array which I can make use for my lookup function later. 我不确定是否需要对它进行JSON字符串化以供以后使用,也许将其保留为对象/数组很简单,以后可以在我的查找函数中使用它。 I want to pull out of this text file only those lines containing the character "#" and use it as delimiter. 我只想从此文本文件中提取包含字符“#”的行并将其用作定界符。 All other lines can be ignored. 所有其他行都可以忽略。 Each line is representing one data set, element, object or whatever it's called correctly. 每行代表一个数据集,元素,对象或任何被正确调用的东西。 The final goal is: user asks for "Apple" and he should get "-9.99" and "BTW" (for example) as answer. 最终目标是:用户要求输入“ Apple”,并且应该获得“ -9.99”和“ BTW”(例如)作为答案。 Here's an example of the raw text file: 这是原始文本文件的示例:

 Sugar#    1051#      331#     BAD#     1.23#    -4.56#    -5.0#  WWF#
 N3T;
 Apple#     551#     3815#     F3W#     5.55#    -9.99#    -1.0#  BTW#
 BBC;
 Berry#      19#       22#      FF#     19.5#   -12.34#     5.0#  CYA#
 T1K;

It should represent 3 elements each of them containing 8 pairs: 它应该表示3个元素,每个元素包含8对:

 name: 'Sugar'
 sec: 1051
 ter: 331
 wrd: 'BAD'
 a: 1.23
 b: -4.56
 c: -5.0
 spon: 'WWF'

 name: 'Apple'
 sec: 551
 ter: 3815
 wrd: 'F3W'
 a: 5.55
 b: -9.99
 c: -1.0
 spon: 'BTW'

 name: 'Berry'
 sec: 19
 ter: 22
 wrd: 'FF'
 a: 19.5
 b: -12.34
 c: 5.0
 spon: 'CYA'

At the beginning I tried using fs.readFileSync to read the whole text file as a string but without success. 一开始,我尝试使用fs.readFileSync将整个文本文件读取为字符串,但没有成功。 Disappointed I tried another approach with readline to read my text file line-by-line and do the filtering because I gained the impression on the net that this method is more memory-friendly and allows reading even very large files. 失望的是,我尝试了另一种使用readline的方法来逐行读取文本文件并进行过滤,因为我在网上获得了一种印象,即该方法对内存更友好,甚至可以读取非常大的文件。 Although I'm pretty sure 3000 lines are a joke figure :) 虽然我很确定3000行是一个玩笑的数字:)

This was my code when approaching with readline: 这是我与readline接触时的代码:

const fs = require('fs');
const readline = require('readline');

function readAndFilter (source, data) {
 var fields;
 var obj = new Object;
 var arr = new Array;

const readAndFilter = readline.createInterface({
 input: fs.createReadStream('test.in'),
 crlfDelay: Infinity
 });

 readAndFilter.on('line', (line) => {
     if ( line.match( /#/ ) ) {
      fields        = line.split( '#' ).slice();
      obj.name      = fields[0].trim();
      obj.sec       = fields[1].trim();
      obj.ter       = fields[2].trim();
      obj.wrd       = fields[3].trim();
      obj.a         = fields[4].trim();
      obj.b         = fields[5].trim();
      obj.c         = fields[6].trim();
      obj.spon      = fields[7].trim();

     console.log(obj);
     // let jsonView = JSON.stringify(obj);
     // arr.push(obj);
     }
   });

  readAndFilter.on('close', function() {
   return arr;
  });

}

readAndFilter();

This is what the code outputs (note that I customized my console log by adding a timestamp for each line output): 这是代码输出的内容(请注意,我通过为每行输出添加时间戳来自定义控制台日志):

 2019-06-16 14:40:10 { name: 'Sugar',
 sec: '1051',
 ter: '331',
 wrd: 'BAD',
 a: '1.23',
 b: '-4.56',
 c: '-5.0',
 spon: 'WWF' }
 2019-06-16 14:40:10 { name: 'Apple',
 sec: '551',
 ter: '3815',
 wrd: 'F3W',
 a: '5.55',
 b: '-9.99',
 c: '-1.0',
 spon: 'BTW' }
 2019-06-16 14:40:10 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' }

the data fields look fine, the file was processed correctly so far but => the object "obj" will hold only the last data set (name:Berry) because it is rewritten after each line-by-line. 数据字段看起来不错,到目前为止文件已正确处理,但是=>对象“ obj”将仅保存最后一个数据集(名称:Berry),因为在每一行之后都将其重写。 I double-checked by cutting the line 我通过删节来仔细检查

console.log(obj);

from the readAndFilter.on('line', ... block and insert it into the 'close' block: 从readAndFilter.on('line',...块中并将其插入到'close'块中:

[...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();
              obj.name      = fields[0].trim();
              obj.sec       = fields[1].trim();
              obj.ter       = fields[2].trim();
              obj.wrd       = fields[3].trim();
              obj.a = fields[4].trim();
              obj.b = fields[5].trim();
              obj.c = fields[6].trim();
              obj.spon      = fields[7].trim();

            // let jsonView = JSON.stringify(obj);
            // arr.push(obj);
            }
      });

      readAndFilter.on('close', function() {
       console.log(obj);
      return arr;
      });
    [...]

the output produced is: 产生的输出是:

 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' }

that won't work as a lookup table, I need all the lines in an array so I can access them later for the lookup routine. 不能用作查找表,我需要数组中的所有行,以便以后可以在查找例程中访问它们。 So I tried to add each object into one array with following code: 因此,我尝试使用以下代码将每个对象添加到一个数组中:

    [...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();
              obj.name      = fields[0].trim();
              obj.sec       = fields[1].trim();
              obj.ter       = fields[2].trim();
              obj.wrd       = fields[3].trim();
              obj.a = fields[4].trim();
              obj.b = fields[5].trim();
              obj.c = fields[6].trim();
              obj.spon      = fields[7].trim();

            // let jsonView = JSON.stringify(obj);
            arr.push(obj);
            }
      });

      readAndFilter.on('close', function() {
       console.log(arr);
      return arr;
      });
    [...]

now I get an array with three objects, but only the last dataset name:Berry again is shown 现在我得到一个包含三个对象的数组,但仅显示最后一个数据集名称:再次显示了Berry

 [ { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' },
 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' },
 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' } ]

I even tried with concat and many other variations. 我什至尝试了concat和许多其他变体。 What the hell am I doing wrong? 我到底在做什么错? Is my approach using the readline/line-by-line technique completely wrong, should I use fs.readFileSync instead? 我使用readline /逐行技术的方法是否完全错误,我应该改用fs.readFileSync吗? I also tried it, here's my approach with fs.readFileSync: 我也尝试过,这是我使用fs.readFileSync的方法:

            function readAndFilter () {
                var fields;
                var obj = new Object;
                var arr = new Array;
                var data = fs.readFileSync('test.in', 'utf8').replace(/\r\n/g,'\n').split('\n').filter(/./.test, /\#/)
    /*
            if ( data.match( /#/ ) ) {
                fields      = data.split( '#' ).slice();
                obj.name    = fields[0].trim();
                obj.cqz     = fields[1].trim();
                obj.itu     = fields[2].trim();
                obj.cont    = fields[3].trim();
                obj.lng     = fields[4].trim();
                obj.lat     = fields[5].trim();
                obj.tz      = fields[6].trim();
                obj.pfx     = fields[7].trim();
            };
    */
    console.log(typeof data + "\n" + data);
    }

The variable data is typeof object as soon as I start to use .split('\\n') and thus I cannot make use of my following if-clause. 当我开始使用.split('\\ n')时,变量数据就是typeof对象,因此无法使用以下if子句。 It fails because it would only work on a string. 它失败了,因为它仅适用于字符串。 Maybe I am pointing completely to the wrong direction and it's much simpler? 也许我完全指出了错误的方向,这更简单了吗? The final goal is: I want to check a search string like "Apple" against this lookup table and retrieve the appropriate values (name, sec, ter, b, or any of them). 最终目标是:我想对照此查找表检查诸如“ Apple”之类的搜索字符串,并检索适当的值(名称,秒,ter,b或其中任何一个)。

I am really thankful to any helpful answer or hint. 非常感谢任何有用的答案或提示。 Please be patient with me and honestly said: I really tried a lot! 请对我耐心说老实话:我真的很努力! Thanks to all. 谢谢大家。

First off, welcome to SO, and compliments on your focused and elaborate question. 首先,欢迎您来到SO,并赞扬您的重点和详尽问题。 Good job! 做得好!

The reason why your stream solution doesn't work as intended is because it's asynchronous, so you're trying to access the result before it's actually there. 流解决方案无法按预期工作的原因是因为它是异步的,因此您尝试在结果真正出现之前对其进行访问。 Check out our classic thread to learn more about this. 查看我们的经典主题以了解更多信息。

For the sake of simplicity, however, I'd suggest to stick with the readFileSync solution. 但是,为了简单起见,我建议您坚持使用readFileSync解决方案。 Generally speaking, sync functions are not recommended in node.js for performance reasons, but given that the file is tiny (3000 lines), it shouldn't hurt much. 一般来说,出于性能原因,不建议在node.js中使用同步功能,但鉴于文件很小(3000行),因此不会造成太大的伤害。

Once you've read the file, the parsing could be done like this: 读取文件后,可以按以下方式完成解析:

 let text = fs.readFileSync('test.in', 'utf8'); let result = []; for (let line of text.trim().split('\\n')) { if (!line.includes('#')) continue; let s = line.trim().split(/[#\\s]+/g); result.push({ name: s[0], sec: s[1], ter: s[2], wrd: s[3], a: s[4], b: s[5], c: s[6], spon: s[7], }); } console.log(result) 

Hello George and many thanks so far. 您好乔治,到目前为止,非常感谢。 I did only cross-read the link you posted but will dive into later. 我只是交叉阅读了您发布的链接,但稍后会深入探讨。 Without the intention of anticipating, I don't think my code failed because I am trying to access the result before it's there as you said. 没有预期的意图,我认为我的代码不会失败,因为我正尝试在您所说的结果到来之前访问它。 In the readline variant I posted you see that I tried the push function to add the new objects into the array which I defined in the beginning. 在我发布的readline变体中,您看到我尝试了push函数将新对象添加到最初定义的数组中。

I was curious after reading your code and tried it. 阅读您的代码并尝试后,我感到很好奇。 I am not interested in a ready-to-use code which I have no clue what it does, but I really like to understand what's going on behind the scenes and how everything works. 我对现成的代码不感兴趣,我不知道它的作用,但我真的很想了解幕后发生的事情以及一切工作原理。 That's why I am still asking, my goal is to understand . 这就是为什么我仍然问,我的目标是了解 So in my humble opinion you did quite the same stuff what I already tried before, the only difference is that your array push command looks different than mine. 因此,以我的拙见,您所做的事情与我之前尝试过的完全相同,唯一的区别是您的数组push命令看起来与我的不同。 I used 我用了

arr.push(obj);

which obviously failed. 显然失败了。 As explained before I used following code for the readline variant: 如前所述,我将以下代码用于readline变体:

 [...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();
              obj.name      = fields[0].trim();
              obj.sec       = fields[1].trim();
              obj.ter       = fields[2].trim();
              obj.wrd       = fields[3].trim();
              obj.a = fields[4].trim();
              obj.b = fields[5].trim();
              obj.c = fields[6].trim();
              obj.spon      = fields[7].trim();

            arr.push(obj);
            }
      });

      readAndFilter.on('close', function() {
       console.log(arr);
      return arr;
      });
    [...]

so I just changed/removed the mentioned line "arr.push(obj)" and replaced the push function to look equivalent to yours: 所以我只是更改/删除了提到的“ arr.push(obj)”行,并替换了push函数以使其看起来与您的等效:

 [...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();

            arr.push({
              name: fields[0].trim(),
              sec: fields[1].trim(),
              ter: fields[2].trim(),
              wrd: fields[3].trim(),
              a: fields[4].trim(),
              b: fields[5].trim(),
              c: fields[6].trim(),
              spon: fields[7].trim(),
            });
            }
      });

      readAndFilter.on('close', function() {
       console.log(arr);
      return arr;
      });
    [...]

this way it outputs the same result as your code, WORKS!!!* As I am using readline and thus line by line is processed, it does not need a for-loop. 这样,它输出与您的代码相同的结果,结果!!! *由于我使用的是readline,因此逐行进行处理,因此不需要for循环。 Was it really this single line that made me sick and caused the trouble? 难道这单行使我生病并引起了麻烦吗? On the other side I am asking myself how it's possible to "beautify" the code to make it more simple, so I don't need to write each name,sec,ter,wrd,a,b,c,spon column. 另一方面,我问自己如何“美化”代码以使其更简单,所以我不需要写每个名称,sec,ter,wrd,a,b,c,spon列。 Imagine one has 150 properties per each object, that would be a pain in the ass to write it down. 想象一下,每个对象都有150个属性,将其写下来会很麻烦。 That's why I initially tried a simple arr.push(obj) , sadly it didn't work as I expected. 这就是为什么我最初尝试一个简单的arr.push(obj)的原因 ,可惜它没有按我预期的那样工作。

Any helpful explanation appreciated. 任何有用的解释表示赞赏。 Thank you again! 再次感谢你! now I need to find a way to read/search through the lookup table which is held in memory so I can display/output the appropriate keypair/value I need to. 现在,我需要找到一种方法来读取/搜索存储在内存中的查找表,以便可以显示/输出所需的适当键对/值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM