简体   繁体   English

CSV解析器可在Windows(而非Linux)中使用

[英]CSV Parser works in windows, not linux

I'm parsing a CSV file that looks like this: 我正在解析一个如下所示的CSV文件:

E1,E2,E7,E8,,,
E2,E1,E3,,,,
E3,E2,E8,,,
E4,E5,E8,E11,,,

I store the first entry in each line in a string, and the rest go in a vector of strings: 我将第一行存储在字符串的每一行中,其余的存储在字符串向量中:

while (getline(file_input, line)) {
    stringstream tokenizer; 
    tokenizer << line;
    getline(tokenizer, roomID, ',');
    vector<string> aVector;
    while (getline(tokenizer, adjRoomID, ',')) {
        if (!adjRoomID.empty()) {
            aVector.push_back(adjRoomID);
        }
    }
    Room aRoom(roomID, aVector);
    rooms.addToTail(aRoom);
}

In windows this works fine, however in Linux the first entry of each vector mysteriously loses the first character. 在Windows中,这很好用,但是在Linux中,每个向量的第一个条目都神秘地丢失了第一个字符。 For Example in the first iteration through the while loop: 例如,在while循环的第一次迭代中,例如:

roomID would be E1 and aVector would be 2 E7 E8 roomIDE1aVector2 E7 E8

then the second iteration: roomID would be E2 and aVector would be 1 E3 然后第二次迭代: roomID将为E2aVector将为1 E3

Notice the missing E's in the first entry of aVector. 请注意,aVector的第一项中缺少E。

when I put in some debugging code it appears that it is initially being stored correctly in the vector, but then something overwrites it. 当我输入一些调试代码时,它似乎最初已正确存储在向量中,但随后被某些内容覆盖。 Kudos to whoever figures this one out. 无论谁想出这一点,我都感到很荣幸。 Seems bizarre to me. 对我来说似乎很奇怪。

EDIT: thank you Erik. 编辑:谢谢你埃里克。 I finally understand. 我终于明白了 On windows all the lines just end with a \\n. 在Windows上,所有行都以\\ n结尾。 When I switch to Unix\\Linux however, the lines end in \\r\\n. 但是,当我切换到Unix \\ Linux时,这些行以\\ r \\ n结尾。 Thus, when getline reads a line it reads everything into the string including the \\r. 因此,当getline读取一行时,它会将所有内容都读取到字符串中,包括\\ r。 I was not accounting for this \\r and it was screwing me up. 我没有考虑这个\\ r,这让我很困惑。 The problem wasn't that the E was missing. 问题不是E丢失了。 It was that I had an extra entry in the vector with a single \\r character in it. 那是我在向量中有一个额外的条目,其中有一个\\ r字符。 My other classes couldn't handle this entry with a single \\r in it. 我的其他类无法使用单个\\ r来处理该条目。

Oops : misread your question, thought it was talking about not working on Windows . 糟糕 :您的问题有误,以为这是在说不能在Windows上运行 I'm leaving the answer here in case anyone stumbles upon this in need of it, but I don't think it will help you (the asker) in this case. 万一有人偶然发现需要它的地方,我在这里留下答案,但是在这种情况下,我认为它不会帮助您(申请者)。

If you're on MSVC6, you could be encountering this bug with the getline function. 如果您使用的是MSVC6,则使用getline函数可能会遇到此错误 There's a fix in the link. 链接中有一个修复程序。

For posterity, here's the info from the link: 对于后代,以下是链接中的信息:

SYMPTOM: "The Standard C++ Library template getline function reads an extra character after encountering the delimiter. Please refer to the sample program in the More Information section for details." 症状:“标准C ++库模板getline函数在遇到定界符后会读取一个额外的字符。有关详细信息,请参考“更多信息”部分中的示例程序。“

Modify the getline member function, which can be found in the following system header file string, as follows: 修改getline成员函数,可以在以下系统头文件字符串中找到该函数,如下所示:

else if (_Tr::eq((_E)_C, _D))
            {_Chg = true;
          //  _I.rdbuf()->snextc(); /* Remove this line and add the line below.*/ 
              _I.rdbuf()->sbumpc();
            break; }

Note: Because the resolution involves modifying a system header file, extreme care should be taken to ensure that nothing else is changed in the header file. 注意:由于解决方案涉及修改系统头文件,因此应格外小心,以确保头文件中没有其他更改。 Microsoft is not responsible for any problems resulting from unwanted changes to the system header file 对于系统头文件的不必要更改而导致的任何问题,Microsoft概不负责。

I suspect that the \\r in the windows \\r\\n linefeed could mess up the code doing your printing. 我怀疑Windows \\ r \\ n换行符中的\\ r可能会使打印时的代码混乱。

If you change to this if statement, does the problem disappear? 如果更改为该if语句,问题是否会消失?

if (!adjRoomID.empty() && (adjRoomID[0] != '\r'))

EDIT: Fixed typo 编辑:固定错别字

Try some cout debugging. 尝试一些cout调试。 Print out the values as you read them in: 在您阅读它们时打印出这些值:

if (!adjRoomID.empty()) {
    cout << '"' << adjRoomId << '"' << endl;
    aVector.push_back(adjRoomID);
}

That will tell you if your strings are being read correctly from the get-go, and will also probably tell you if you're reading in extra weird characters from the file. 这将告诉您是否从一开始就正确地读取了字符串,也可能会告诉您是否正在从文件中读取额外的怪异字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM