简体   繁体   中英

std::vector<string> odd behavior

I have some weird issues I cannot figure out. When I run the code below which takes a file.txt reads it line by line into a vector<string> and then compares each index to string "--" it does not make it to the comparison stage.

Further more, in the convert_file() under the for loop string m, has some weird behavior: string m = "1"; m+= "--"; string m = "1"; m+= "--"; ('--' inside vector) m+= "2"; will print to console 2-- ; which makes me think something is bugging out the vector. The 2 is replacing the 1, the first character. This makes it look like the vector is bugged.

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

vector<string> get_file(const char* file){
      int SIZE=256, ln=0;
      char str[SIZE];
      vector<string> strs;
      ifstream in(file, ios::in);
      if(!in){
        return strs;
      } else {
        while(in.getline(str,SIZE)){
          strs.push_back(string(str));
          ln++;
        }
      }
      in.close();
      return strs;
    }

void convert_file(const char* file){
      vector<string> s = get_file(file);

      vector<string> d;
      int a, b;
      bool t = false;
      string comp = "--";

      for(int i=0; i<s.size(); i++){
        string m = "1";
        m+= string(s.at(i));
        m+= "2";
        cout << m << endl;
        if(s.at(i) == comp){
          cout << "s[i] == '--'" << endl;
        }
      }
    }

int main(){
  convert_file("test.txt");
  return 0;
}

now when I run a test file to check a similar program:

#include <iostream>
#include <string>
#include <vector>
using namespace std;

int main(){
  vector<string> s;
  s.push_back("--");
  s.push_back("a");

  for(int i=0; i<s.size(); i++){
    cout << "1" << s.at(i) << "2" << endl;
    if(s.at(i) == "--"){
      cout << i << "= --" << endl;
    }
  }
  return 0;
}

prints off 1--2 , 0= -- , 1a2 . it works, it prints properly, and does the comparison. This leads me to think something is happening when I pull the line into a string.

Windows 7, cygwin64
g++ version 4.9.3
compile: D:\projects\test>g++ -o a -std=c++11 test.cpp

Based on the behavior and the discussion the lines in the file are terminated using a "\\r\\n" sequence. The easiest approach for dealing with the remaining '\\r' is to remove it after reading a line. For example:

for (std::string line; std::getline(file, line); ) {
    if (!line.empty() && line.back() == '\r') {
        line.resize(line.size() - 1u);
    }
    strs.push_back(line);
}

If you insist in reading into char arrays you can use file.gcount() to determine the number of characters read to find the end of the string quickly. Note, however, that the number includes the bewline character, ie, you'd want to check str[file.gcount() - 2] and potentially set it to '\\0' (if the count is bigger or equal to 2, of course).

As answered by Dietmar Kühl already, the problem is with the \\r\\n line endings.

However, you should not need to modify your source code. The default behaviour in C++ is supposed to be to open files in text mode. Text mode means that whenever a line ending is found, where "line ending" depends on the platform you're using, it gets translated so that your program just sees a single \\n . You're supposed to explicitly request "binary mode" from your program to disable this line ending translation. This has been long-standing practise on Windows systems, is the behaviour well supported by the C++ standard, and is the expected behaviour with native Windows compilers, but for compatibility with POSIX and existing Unix programs that do not bother setting the file mode properly, Cygwin ignores this and defaults to opening files in binary mode unless a custom Cygwin-specific text mode is explicitly requested.

This is covered in the Cygwin FAQ . The first solutions provided there (using O_TEXT or "t" , depending on how you open your file) are non-standard so break your code with other environments, and they are not as easy to use with C++ <fstream> file access.

However, the next solutions provided there do work even for C++ programs:

You can also avoid to change the source code at all by linking an additional object file to your executable. Cygwin provides various object files in the /usr/lib directory which, when linked to an executable, changes the default open modes of any file opened within the executed process itself. The files are

\nbinmode.o - Open all files in binary mode. \ntextmode.o - Open all files in text mode. \ntextreadmode.o - Open all files opened for reading in text mode. \nautomode.o - Open all files opened for reading in text mode, \n                 all files opened for writing in binary mode. \n

And indeed, changing your compiler and linker invocation from g++ -oa -std=c++11 test.cpp to g++ -oa -std=c++11 test.cpp /usr/lib/textmode.o , your program works without changes to your source code. Linking with textmode.o basically means that your I/O will work the way it already should work by default.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM