简体   繁体   English

如何使用node.js从.list文件中提取数据

[英]How to extract data from a .list file with node.js

I have a .list file containing information on movies. 我有一个包含电影信息的.list文件。 The file is formatted as follows 该文件格式如下

New  Distribution  Votes  Rank  Title
      0000000125  1176527   9.2  The Shawshank Redemption (1994)
      0000000125  817264   9.2  The Godfather (1972)
      0000000124  538216   9.0  The Godfather: Part II (1974)
      0000000124  1142277   8.9  The Dark Knight (2008)
      0000000124  906356   8.9  Pulp Fiction (1994)

The code I have so far is as follows: 我到目前为止的代码如下:

//modules ill be using
var fs = require('fs');
var csv = require('csv');

csv().from.path('files/info.txt', { delimiter: '  '})
.to.array(function(data){
    console.log(data);
});

But because the values are separated by single spaces, double spaces and tabs. 但是因为值是由单个空格,双空格和制表符分隔的。 There is no single delimiter to use. 没有单独的分隔符可供使用。 How can I extract this information into an array? 如何将此信息提取到数组中?

You can shrink multiple spaces in to one space with and then you can read it as string like; 您可以将多个空格缩小到一个空格,然后您可以将其读作字符串;

fs = require('fs')
fs.readFile('files/info.txt', 'utf8', function (err, csvdata) {
  if (err) {
    return console.log(err);
  }
  var movies = csvdata.replace(/\s+/g, "\t");

  csv().from.string(moviews, { delimiter: '\t'})
    .to.array(function(data){
        console.log(data);
    });

});

It looks easy to parse with regex: 用正则表达式解析它看起来很容易:

function parse(row) {
  var match = row.match(/\s{6}(\d*)\s{2}(\d*)\s{3}(\d*\.\d)/)
  return {
    distribution: match[1],
    votes: match[2],
    rank: match[3]
  };
}

fs.readFileSync(file)
  .split('\n')
  .slice(1) //since we don't care about the first row
  .map(parse);

I will live you to build the rest of the regex. 我会让你继续建立其余的正则表达式。 I juse two tools to do so: rubular.com and node.js repl. 我用两个工具来做:rubular.com和node.js repl。

This \\s{6}(\\d*)\\s{2}(\\d*) means: MATCH 6 SPACEs, then capture an arbitrary number of digits then match 2 spaces, then capture another arbitrary number of digits, etc. 这个\\s{6}(\\d*)\\s{2}(\\d*)表示:MATCH 6 SPACE,然后捕获任意数量的数字,然后匹配2个空格,然后捕获另一个任意数量的数字,等等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM