简体   繁体   English

CSV-Parser 似乎无法正确解析换行符数据

[英]CSV-Parser doesn't seem to parse line break data properly

Yes, I know I should have better data, and if nothing works I will go about fixing my data, but I was wondering if there was any way I could get csv-parser parser to parse是的,我知道我应该有更好的数据,如果没有任何效果,我将修复我的数据,但我想知道是否有任何方法可以让 csv-parser 解析器进行解析

"United States of
America",140640,17987,2398,286,Local transmission,0

Into进入

{
Country: United States of America
... blah blah
... blah blah
... blah blah
... blah blah
}

fs.createReadStream("./csv/03312020.csv")
    .pipe(
        csv([
            "Country",
            "Total",
            "TotalNew"
        ])
    )
    .on("data", row => {
        console.log(row.Country);
        let result = contains(row.Country);
        if (result !== undefined) {
            row.Date = today;
            row.id = result + "-" + today;
            if (db.dates.get(row.id) === undefined) db.dates.create(row);
        }
    })
    .on("end", () => {
        console.log("CSV file successfully processed for", today);
    });

I thought the csv-parser would see that there is a quotation mark and wrap that as one "column" but apparently it doesn't.我认为 csv 解析器会看到有一个引号并将其包装为一个“列”,但显然它没有。 Is there a better way to parse this data other than reparsing my CSV file itself?除了重新解析我的 CSV 文件本身之外,有没有更好的方法来解析这些数据?

What you can do is split that file into lines, then join the lines that have an odd number of " characters.您可以做的是将该文件拆分为多行,然后加入具有奇数个 " 字符的行。

My script also handles the case where the \\n character appears multiple times in a single row of data.我的脚本还处理 \\n 字符在单行数据中多次出现的情况。
This is based on the fact that only the first and the last line of a multi-line row will have an odd number of " characters.这是基于这样一个事实:只有多行行的第一行和最后一行会有奇数个 " 字符。

You can reformat your file using my script and then feed it into your csv parser.您可以使用我的脚本重新格式化您的文件,然后将其输入您的 csv 解析器。

 const example1 = `"United States of America",140640,17987,2398,286,Local transmission,0`; console.log(reformatCsv(example1)); const example2 = `"United States of America",140640,17987,2398,286,"Local transmission",0`; console.log(reformatCsv(example2)); // @param file: string function reformatCsv(file) { const lines = file.split('\\n'); let reformattedRows = []; const parts = []; for (const line of lines) { const quoteMatches = line.match(/"/g); const isEvenNumberOfQuotes = !quoteMatches || quoteMatches.length % 2 == 0; const noPartialRowsYet = !parts.length; if (noPartialRowsYet) { if (isEvenNumberOfQuotes) // normal row { reformattedRows.push(line); } else // this is a partial row { parts.push(line); } } else // continuation of a partial row { parts.push(line); if (!isEvenNumberOfQuotes) // we got all of the parts { // join the parts // I replace \\n with a space character, but you don't have to reformattedRows.push(parts.join(' ')); } } } return reformattedRows.join('\\n'); }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM