繁体   English   中英

如何为原始文件中的每 50000 行打印一个新文件

[英]How do I print out a new file for every 50000 lines in the original file

这是我当前的代码,而且很简单。 一次只读取一行文件,每行打印出一个新文件,该文件是原始名称但附加了 _part,每 50000 行增加一个数字,一旦完成读取,将每个文件名运行到 function 用于处理文件。 但是由于某种原因,它只是抓住每行的末端并将其打印 10000 次(原始文件中的行)。 起初它有效,我改变了一些东西,它开始这样做,然后即使我取消了这些更改,它仍然继续这样做

const fs = require('fs');
const csv = require('csv-parser');
//File containing unprocessed addresses
let fileName = ("Refinitiv_Address_GBR_10000.csv");
//Country we are looking at address of
let country = "UK";

let fileRead;
let fileWrite;
let fileNum = 1;

DivideFile();

async function DivideFile() {
    let lineNum = 0;

    fileWrite = fs.createWriteStream(`./Originals/${fileName.split('.')[0]}_part${fileNum}.${fileName.split('.')[1]}`);

    fileRead = fs.createReadStream(`./Originals/${fileName}`)
        .pipe(csv())
        //Indicate start of reading
        .on('resume', () => {
            console.log("Processing file");
        })
        .on('data', (data) => {
            lineNum++;
            console.log(Object.values(data).toString());
            fs.appendFile(`./Originals/${fileName.split('.')[0]}_part${fileNum}.${fileName.split('.')[1]}`, Object.values(data).toString() + '\n', () => {
                //Nothing to go here at the moment
            });

            if (lineNum == 50000) {
                fileNum++;
                lineNum = 0;
            }
        })
        .on('end', () => {
            for (var file in fileNum) {
                RunFunc(`${fileName.split('.')[0]}_part${file}.${fileName.split('.')[1]}`);
            }
        });
}

这是原始数据的样本。 所有内容均来自公共来源而非私人信息

,,,,GBR,
"Todd Campus, West of Scotland Science Park,Maryhill Road",GLASGOW,UNITED KINGDOM-NA,G20 0UA,GBR,GBR
,,,,GBR,GBR
,,,,GBR,
"Horsfield Way,, Bredbury Industrial Park",STOCKPORT,CHESHIRE,SK6 2SU,GBR,GBR
"Brunel Way, The Nucleus",Dartford,KENT,DA1 5GA,GBR,
,,,,GBR,
,,,,GBR,
5 New Street Square,London,London,EC4A 3TW,GBR,
"Pentwyn Farm, Huntingdon",,,HR5 3PQ,GBR,GBR
124 Horseferry Road,LONDON,UNITED KINGDOM-NA,SW1P 2TX,GBR,GBR
,,,,GBR,
Unit 700 Fareham Reach Fareham Road,,,,GBR,GBR
"Eastwood House, Glebe Road",CHELMSFORD,ESSEX,CM1 1RS,GBR,GBR
Fineshade Abbey,CORBY,NORTHAMPTONSHIRE,NN17 3BA,GBR,GBR
,,,,,GBR
,,,,GBR,
3 Hempstead Close,,ESSEX,IG9 5JQ,GBR,GBR
,,,,GBR,
,,,,,GBR
,,,,GBR,
,,,,GBR,
25 Farringdon Street,LONDON,UNITED KINGDOM-NA,EC4A 4AB,GBR,GBR
100 Wigmore St,London,X0,,GBR,GBR
,,,,GBR,

这是前 25 行,打印到 _part1

GBR,GBR
GBR,GBR
,GBR
,GBR
GBR,GBR
,GBR
,GBR
GBR,GBR
,GBR
GBR,GBR
GBR,GBR
,GBR
GBR,GBR
GBR,GBR
GBR,
,GBR
,GBR
GBR,GBR
GBR,
,GBR
GBR,GBR
GBR,GBR
,GBR
,GBR
GBR,GBR

我什至去修剪代码只打印出每一行,它一直这样做

这并不理想,但这基本上是您的代码将其分成漂亮的小块。 您应该使用csv-parse库而不是csv-parser ,并在循环的每次迭代中更新文件引用。 正如其他人提到的, split unix function 将是一个不错的选择。 我用您在file.csv中的样本数据对此进行了测试

const fs = require('fs');
const csv = require('csv-parse'); // csv-parse not csv-parser
//File containing unprocessed addresses
let fileName = ("file.csv");
//Country we are looking at address of
let country = "UK";

let fileRead;
let fileWrite;
let fileNum = 1;

DivideFile();

async function DivideFile() {
  let lineNum = 0;

  fileWrite = fs.createWriteStream(`./${fileName.split('.')[0]}_part${fileNum}.${fileName.split('.')[1]}`);

  fileRead = fs.createReadStream(`./${fileName}`)
  .pipe(csv())
  //Indicate start of reading
  .on('resume', () => {
    console.log("Processing file");
  })
  .on('data', (data) => {
    lineNum++;
    fileWrite.write(Object.values(data).toString() + '\n');

    if (lineNum === 10) {
      fileNum++;
      fileWrite = fs.createWriteStream(`./${fileName.split('.')[0]}_part${fileNum}.${fileName.split('.')[1]}`);
      lineNum = 0;
    }
  })
  .on('end', () => {
    for (var file in fileNum) {
     // RunFunc(`${fileName.split('.')[0]}_part${file}.${fileName.split('.')[1]}`);
    }
  });
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM