简体   繁体   中英

JavaScript - reading a whitespace delimited text file into array and use as lookup table

First off: I'm an absolute beginner in JavaScript and started 2 weeks ago to learn many hours a day. I am running a node.JS server on GNU/Linux and I tried a lot of variations to achieve the goal. Unfortunately I stuck and don't know how to continue.

I have a text file with white-spaces and line feeds and the file contains something about > 2000 lines. I want to read this text file into my javascript program so I can use later as a lookup table. I am not sure if I need to JSON stringify it for later use, maybe it's simple to leave it as an object/array which I can make use for my lookup function later. I want to pull out of this text file only those lines containing the character "#" and use it as delimiter. All other lines can be ignored. Each line is representing one data set, element, object or whatever it's called correctly. The final goal is: user asks for "Apple" and he should get "-9.99" and "BTW" (for example) as answer. Here's an example of the raw text file:

 Sugar#    1051#      331#     BAD#     1.23#    -4.56#    -5.0#  WWF#
 N3T;
 Apple#     551#     3815#     F3W#     5.55#    -9.99#    -1.0#  BTW#
 BBC;
 Berry#      19#       22#      FF#     19.5#   -12.34#     5.0#  CYA#
 T1K;

It should represent 3 elements each of them containing 8 pairs:

 name: 'Sugar'
 sec: 1051
 ter: 331
 wrd: 'BAD'
 a: 1.23
 b: -4.56
 c: -5.0
 spon: 'WWF'

 name: 'Apple'
 sec: 551
 ter: 3815
 wrd: 'F3W'
 a: 5.55
 b: -9.99
 c: -1.0
 spon: 'BTW'

 name: 'Berry'
 sec: 19
 ter: 22
 wrd: 'FF'
 a: 19.5
 b: -12.34
 c: 5.0
 spon: 'CYA'

At the beginning I tried using fs.readFileSync to read the whole text file as a string but without success. Disappointed I tried another approach with readline to read my text file line-by-line and do the filtering because I gained the impression on the net that this method is more memory-friendly and allows reading even very large files. Although I'm pretty sure 3000 lines are a joke figure :)

This was my code when approaching with readline:

const fs = require('fs');
const readline = require('readline');

function readAndFilter (source, data) {
 var fields;
 var obj = new Object;
 var arr = new Array;

const readAndFilter = readline.createInterface({
 input: fs.createReadStream('test.in'),
 crlfDelay: Infinity
 });

 readAndFilter.on('line', (line) => {
     if ( line.match( /#/ ) ) {
      fields        = line.split( '#' ).slice();
      obj.name      = fields[0].trim();
      obj.sec       = fields[1].trim();
      obj.ter       = fields[2].trim();
      obj.wrd       = fields[3].trim();
      obj.a         = fields[4].trim();
      obj.b         = fields[5].trim();
      obj.c         = fields[6].trim();
      obj.spon      = fields[7].trim();

     console.log(obj);
     // let jsonView = JSON.stringify(obj);
     // arr.push(obj);
     }
   });

  readAndFilter.on('close', function() {
   return arr;
  });

}

readAndFilter();

This is what the code outputs (note that I customized my console log by adding a timestamp for each line output):

 2019-06-16 14:40:10 { name: 'Sugar',
 sec: '1051',
 ter: '331',
 wrd: 'BAD',
 a: '1.23',
 b: '-4.56',
 c: '-5.0',
 spon: 'WWF' }
 2019-06-16 14:40:10 { name: 'Apple',
 sec: '551',
 ter: '3815',
 wrd: 'F3W',
 a: '5.55',
 b: '-9.99',
 c: '-1.0',
 spon: 'BTW' }
 2019-06-16 14:40:10 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' }

the data fields look fine, the file was processed correctly so far but => the object "obj" will hold only the last data set (name:Berry) because it is rewritten after each line-by-line. I double-checked by cutting the line

console.log(obj);

from the readAndFilter.on('line', ... block and insert it into the 'close' block:

[...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();
              obj.name      = fields[0].trim();
              obj.sec       = fields[1].trim();
              obj.ter       = fields[2].trim();
              obj.wrd       = fields[3].trim();
              obj.a = fields[4].trim();
              obj.b = fields[5].trim();
              obj.c = fields[6].trim();
              obj.spon      = fields[7].trim();

            // let jsonView = JSON.stringify(obj);
            // arr.push(obj);
            }
      });

      readAndFilter.on('close', function() {
       console.log(obj);
      return arr;
      });
    [...]

the output produced is:

 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' }

that won't work as a lookup table, I need all the lines in an array so I can access them later for the lookup routine. So I tried to add each object into one array with following code:

    [...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();
              obj.name      = fields[0].trim();
              obj.sec       = fields[1].trim();
              obj.ter       = fields[2].trim();
              obj.wrd       = fields[3].trim();
              obj.a = fields[4].trim();
              obj.b = fields[5].trim();
              obj.c = fields[6].trim();
              obj.spon      = fields[7].trim();

            // let jsonView = JSON.stringify(obj);
            arr.push(obj);
            }
      });

      readAndFilter.on('close', function() {
       console.log(arr);
      return arr;
      });
    [...]

now I get an array with three objects, but only the last dataset name:Berry again is shown

 [ { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' },
 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' },
 { name: 'Berry',
 sec: '19',
 ter: '22',
 wrd: 'FF',
 a: '19.5',
 b: '-12.34',
 c: '5.0',
 spon: 'CYA' } ]

I even tried with concat and many other variations. What the hell am I doing wrong? Is my approach using the readline/line-by-line technique completely wrong, should I use fs.readFileSync instead? I also tried it, here's my approach with fs.readFileSync:

            function readAndFilter () {
                var fields;
                var obj = new Object;
                var arr = new Array;
                var data = fs.readFileSync('test.in', 'utf8').replace(/\r\n/g,'\n').split('\n').filter(/./.test, /\#/)
    /*
            if ( data.match( /#/ ) ) {
                fields      = data.split( '#' ).slice();
                obj.name    = fields[0].trim();
                obj.cqz     = fields[1].trim();
                obj.itu     = fields[2].trim();
                obj.cont    = fields[3].trim();
                obj.lng     = fields[4].trim();
                obj.lat     = fields[5].trim();
                obj.tz      = fields[6].trim();
                obj.pfx     = fields[7].trim();
            };
    */
    console.log(typeof data + "\n" + data);
    }

The variable data is typeof object as soon as I start to use .split('\\n') and thus I cannot make use of my following if-clause. It fails because it would only work on a string. Maybe I am pointing completely to the wrong direction and it's much simpler? The final goal is: I want to check a search string like "Apple" against this lookup table and retrieve the appropriate values (name, sec, ter, b, or any of them).

I am really thankful to any helpful answer or hint. Please be patient with me and honestly said: I really tried a lot! Thanks to all.

First off, welcome to SO, and compliments on your focused and elaborate question. Good job!

The reason why your stream solution doesn't work as intended is because it's asynchronous, so you're trying to access the result before it's actually there. Check out our classic thread to learn more about this.

For the sake of simplicity, however, I'd suggest to stick with the readFileSync solution. Generally speaking, sync functions are not recommended in node.js for performance reasons, but given that the file is tiny (3000 lines), it shouldn't hurt much.

Once you've read the file, the parsing could be done like this:

 let text = fs.readFileSync('test.in', 'utf8'); let result = []; for (let line of text.trim().split('\\n')) { if (!line.includes('#')) continue; let s = line.trim().split(/[#\\s]+/g); result.push({ name: s[0], sec: s[1], ter: s[2], wrd: s[3], a: s[4], b: s[5], c: s[6], spon: s[7], }); } console.log(result) 

Hello George and many thanks so far. I did only cross-read the link you posted but will dive into later. Without the intention of anticipating, I don't think my code failed because I am trying to access the result before it's there as you said. In the readline variant I posted you see that I tried the push function to add the new objects into the array which I defined in the beginning.

I was curious after reading your code and tried it. I am not interested in a ready-to-use code which I have no clue what it does, but I really like to understand what's going on behind the scenes and how everything works. That's why I am still asking, my goal is to understand . So in my humble opinion you did quite the same stuff what I already tried before, the only difference is that your array push command looks different than mine. I used

arr.push(obj);

which obviously failed. As explained before I used following code for the readline variant:

 [...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();
              obj.name      = fields[0].trim();
              obj.sec       = fields[1].trim();
              obj.ter       = fields[2].trim();
              obj.wrd       = fields[3].trim();
              obj.a = fields[4].trim();
              obj.b = fields[5].trim();
              obj.c = fields[6].trim();
              obj.spon      = fields[7].trim();

            arr.push(obj);
            }
      });

      readAndFilter.on('close', function() {
       console.log(arr);
      return arr;
      });
    [...]

so I just changed/removed the mentioned line "arr.push(obj)" and replaced the push function to look equivalent to yours:

 [...]
      readAndFilter.on('line', (line) => {
            if ( line.match( /#/ ) ) {
              fields        = line.split( '#' ).slice();

            arr.push({
              name: fields[0].trim(),
              sec: fields[1].trim(),
              ter: fields[2].trim(),
              wrd: fields[3].trim(),
              a: fields[4].trim(),
              b: fields[5].trim(),
              c: fields[6].trim(),
              spon: fields[7].trim(),
            });
            }
      });

      readAndFilter.on('close', function() {
       console.log(arr);
      return arr;
      });
    [...]

this way it outputs the same result as your code, WORKS!!!* As I am using readline and thus line by line is processed, it does not need a for-loop. Was it really this single line that made me sick and caused the trouble? On the other side I am asking myself how it's possible to "beautify" the code to make it more simple, so I don't need to write each name,sec,ter,wrd,a,b,c,spon column. Imagine one has 150 properties per each object, that would be a pain in the ass to write it down. That's why I initially tried a simple arr.push(obj) , sadly it didn't work as I expected.

Any helpful explanation appreciated. Thank you again! now I need to find a way to read/search through the lookup table which is held in memory so I can display/output the appropriate keypair/value I need to.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM