简体   繁体   English

解析具有CR LF EOL结构的.csv文件

[英]Parsing .csv files with CR LF EOL structure

I'm trying to parse a CSV file and getline() is reading the entire file as one line. 我正在尝试解析CSV文件,而getline()将整个文件读取为一行。 On the assumption that getline() wasn't getting what it expected, I tried \\r , \\n , \\n\\r , \\r\\n , and \\0 as arguments with no luck. 假设getline()不能达到预期效果,我尝试使用\\r\\n\\n\\r\\r\\n\\0作为运气不好的参数。

I took a look at the EOL characters and an seeing CR and then LF . 我看了看EOL字符,先看CR ,再看LF Is getline() just ignoring this or am I missing something? getline()只是忽略了这一点,还是我缺少了什么? Also, what's the fix here? 另外,这里的解决方法是什么?

The goal of this function is a general purpose CSV parsing function that stores the data as a 2d vector of strings. 此函数的目标是通用CSV解析功能,该功能将数据存储为字符串的2d向量。 Although advice on that front is welcome, I'm only looking for a way to fix this issue. 尽管欢迎您提供有关这方面的建议,但我只是在寻找一种解决此问题的方法。

vector<vector<string>> Parse::parseCSV(string file)
{
    // input fstream instance
    ifstream inFile;
    inFile.open(file);

    // check for error
    if (inFile.fail()) { cerr << "Cannot open file" << endl; exit(1); }

    vector<vector<string>> data;
    string line;

    while (getline(inFile, line))
    {
        stringstream inputLine(line);
        char delimeter = ',';
        string word;
        vector<string> brokenLine;
        while (getline(inputLine, word, delimeter)) {
            word.erase(remove(word.begin(), word.end(), ' '), word.end());      // remove all white spaces
            brokenLine.push_back(word);
        }
        data.push_back(brokenLine);
    }

    inFile.close();

    return data;

};

Here's the hexdump. 这是十六进制转储。 I'm not sure what exactly this is showing. 我不确定这到底在显示什么。

0000000 55 4e 49 58 20 54 49 4d 45 2c 54 49 4d 45 2c 4c
0000010 41 54 2c 4c 4f 4e 47 2c 41 4c 54 2c 44 49 53 54
0000020 2c 48 52 2c 43 41 44 2c 54 45 4d 50 2c 50 4f 57
0000030 45 52 0d 31 34 32 34 31 30 35 38 30 38 2c 32 30
0000040 31 35 2d 30 32 2d 31 36 54 31 36 3a 35 36 3a 34
0000050 38 5a 2c 34 33 2e 38 39 36 34 2c 31 30 2e 32 32
0000060 34 34 34 2c 30 2e 38 37 2c 30 2c 30 2c 30 2c 4e
0000070 6f 20 44 61 74 61 2c 4e 6f 20 44 61 74 61 0d 31
0000080 34 32 34 31 30 35 38 38 35 2c 32 30 31 35 2d 30
0000090 32 2d 31 36 54 31 36 3a 35 38 3a 30 35 5a 2c 34
00000a0 33 2e 39 30 31 33 35 2c 31 30 2e 32 32 30 34 31
00000b0 2c 31 2e 30 32 2c 30 2e 36 33 39 2c 30 2c 30 2c
00000c0 4e 6f 20 44 61 74 61 2c 4e 6f 20 44 61 74 61 0d
00000d0 31 34 32 34 31 30 35 38 38 38 2c 32 30 31 35 2d
00000e0 30 32 2d 31 36 54 31 36 3a 35 38 3a 30 38 5a 2c
00000f0 34 33 2e 39 30 31 34 38 2c 31 30 2e 32 32 30 31
0000100

The first two lines of the file 文件的前两行

UNIX TIME,TIME,LAT,LONG,ALT,DIST,HR,CAD,TEMP,POWER
1424105808,2015-02-16T16:56:48Z,43.8964,10.22444,0.87,0,0,0,No Data,No Data

UPDATE Looks like it was \\r . UPDATE看起来是\\r Im not sure why it didn't work earlier, but I learned a few things while exploring. 我不确定为什么它没有更早生效,但是我在探索时学到了一些东西。 Thanks for the help guys. 感谢您的帮助。

A simple fix would be to write your own getline 一个简单的解决方法是编写自己的getline
For example one that ignores any combination of \\n , \\r 例如,忽略\\n\\r任何组合的一个
in the beginning of the line, and breaking on any too. 在该行的开头,然后继续中断。
That will work on any platform, but wont preserve empty lines. 那可以在任何平台上使用,但不会保留空白行。

After looking at the hex-dump, the delimiter is 0d ( \\r ) 查看十六进制转储后,定界符为0d\\r

您是否尝试将\\r\\n的顺序切换为\\n\\r

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM