简体   繁体   English

C ++ stringstream以固定长度字符串读取到char数组

[英]C++ stringstream read in fixed length string to char array

Given data format as "int,int,...,int,string,int", is it possible to use stringstream (only) to properly decode the fields? 给定数据格式为“int,int,...,int,string,int”,是否可以使用stringstream(仅)来正确解码字段?

[Code] [码]

int main(int c, char** v)
{
    std::string line = "0,1,2,3,4,5,CT_O,6";
    char delimiter[7];
    int id, ag, lid, cid, fid, did, j = -12345;
    char dcontact[4]; // <- The size of <string-field> is known and fixed
    std::stringstream ssline(line);
    ssline >> id >> delimiter[0]
    >> ag >> delimiter[1]
    >> lid >> delimiter[2]
    >> cid >> delimiter[3]
    >> fid >> delimiter[4]
    >> did >> delimiter[5]  // <- should I do something here?
    >> dcontact >> delimiter[6]
    >> j;
    std::cout << id << ":" << ag << ":" << lid << ":" << cid << ":" << fid << ":" << did << ":";
    std::cout << dcontact << "\n";
}

[Output] 0:1:2:3:4:5: CT_6,0 :-45689 , the bolded part shows the stringstream failed to read 4 char only to dcontact. [输出] 0:1:2:3:4:5: CT_6,0 :-45689 ,粗体部分显示0:1:2:3:4:5: CT_6,0 :-45689无法读取4个字符仅用于dcontact。 dcontact actually hold more than 4 chars, leaving j with garbage data. dcontact实际上拥有超过4个字符,留下j与垃圾数据。

Yes, there is no specific overload of operator >> (istream&, char[N]) for N and there is for char* so it sees that as the best match. 是的,对于N, operator >> (istream&, char[N])没有特定的重载,并且存在char*因此它将其视为最佳匹配。 The overload for char* reads to the next whitespace character so it doesn't stop at the comma. char *的重载读取到下一个空格字符,因此它不会停留在逗号处。

You could wrap your dcontact in a struct and have a specific overload to read into your struct. 你可以将你的dcontact包装在一个结构中,并有一个特定的重载来读入你的结构。 Else you could use read, albeit it breaks your lovely chain of >> operators. 否则你可以使用阅读,虽然它打破了你可爱的>>运营商链。

ssline.read( dcontact, 4 );

will work at that point. 将在那一点工作。

To read up to a delimiter, incidentally, you can use getline . 顺便说一句,要读取分隔符,可以使用getline ( get will also work but getline free-function writing to a std::string will mean you don't have to guess the length). get也可以工作,但getline自由函数写入std::string意味着你不必猜测长度)。

(Note that other people have specified to use get rather than read , but this will fail in your case as you do not have an extra byte at the end of your dcontact array for a null terminator. IF you want dcontact to be null-terminated then make it 5 characters and use 'get` and the null will be appended for you). (注意,其他人已经指定使用get而不是read ,但是在你的情况下这将失败,因为你的dcontact数组末尾没有额外的字节用于null终止符。如果你想dcontact是null终止的然后使它成为5个字符并使用'get`并为您附加null。

Slightly more robust (handles the ',' delimiter correctly): 稍微强一些(正确处理','分隔符):

template <char D>
std::istream& delim(std::istream& in)
{
  char c;
  if (in >> c && c != D) in.setstate(std::ios_base::failbit);
  return in;
}

int main()
{
  std::string line = "0,1,2,3,4,5,CT_O,6";
  int id, ag, lid, cid, fid, did, j = -12345;
  char dcontact[5]; // <- The size of <string-field> is known and fixed
  std::stringstream ssline(line);
  (ssline >> id >> delim<','>
          >> ag >> delim<','>
          >> lid >> delim<','>
          >> cid >> delim<','>
          >> fid >> delim<','>
          >> did >> delim<','> >> std::ws
          ).get(dcontact, 5, ',') >> delim<','>
          >> j;
  std::cout << id << ":" << ag << ":" << lid << ":"
            << cid << ":" << fid << ":" << did << ":";
            << dcontact << "\n";
}

try this 尝试这个

  int main(int c, char** v) {
    string line = "0,1,2,3,4,5,CT_O,6";
    char delimiter[7];
    int id, ag, lid, cid, fid, did, j = -12345;
    char dcontact[5]; // <- The size of <string-field> is known and fixed

    stringstream ssline(line);

    ssline >> id >> delimiter[0]
            >> ag >> delimiter[1]
            >> lid >> delimiter[2]
            >> cid >> delimiter[3]
            >> fid >> delimiter[4]
            >> did >> delimiter[5];

    ssline.get(dcontact, 5);

    ssline >> delimiter[6]
            >> j;
    std::cout << id << ":" << ag << ":" << lid << ":" << cid << ":" << fid << ":" << did << ":";
    std::cout << dcontact << "\n" << j;
    }

The problem is that the >> operator for a string ( std::string or a C style string) actually implements the semantics for a word, with a particular definition of word. 问题是字符串的>>运算符( std::string或C样式字符串)实际上实现了单词的语义,具有单词的特定定义。 The decision is arbitrary (I would have made it a line), but since a string can represent many different things, they had to choose something. 决定是任意的(我会把它作为一条线),但由于一个字符串可以代表许多不同的东西,他们必须选择一些东西。

The solution, in general, is not to use >> on a string, ever. 通常,解决方案是不要在字符串上使用>> Define the class you want (here, probably something like Symbol ), and define an operator >> for it which respects its semantics. 定义你想要的类(这里,可能是像Symbol这样的类),并为它定义一个尊重其语义的运算符>> You're code will be a lot clearer for it, and you can add various invarant controls as appropriate. 您的代码将更加清晰,您可以根据需要添加各种invarant控件。 If you know that the field is always exactly four characters, you can do something simple like: 如果你知道该字段总是四个字符,你可以做一些简单的事情:

class DContactSymbol
{
    char myName[ 4 ];
public:
    //  ...
    friend std::istream&
    operator>>( std::istream& source, DContactSymbol& dest );
    //  ...
};

std::istream&
operator>>( std::istream& source, DContactSymbol& dest )
{
    std::sentry guard( source );
    if ( source ) {
        std::string tmp;
        std::streambuf* sb = source.rdbuf();
        int ch = sb->sgetc();
        while ( source && (isalnum( ch ) || ch == '_') ) {
            tmp += static_cast< char >( ch );
            if ( tmp.size() > sizeof( dest.myName ) ) {
                source.setstate( std::ios_base::failbit );
            }
        }
        if ( ch == source::traits_type::eof() ) {
            source.setstate( std::ios_base::eofbit );
        }
        if ( tmp.size() != sizeof( dest.myName ) ) {
            source.setstate( std::ios_base::failbit );
        }
        if ( source ) {
            tmp.copy( dest.myName, sizeof( dest.myName ) );
        }
    }
    return source;
}

(Note that unlike some of the other suggestions, for example using std::istream::read , this one maintains all of the usual conventions, like skipping leading white space dependent on the skipws flag.) (请注意,与其他一些建议不同,例如使用std::istream::read ,这个会保留所有常用约定,例如跳过依赖于skipws标志的前导空格。)

Of course, if you can't guarantee 100% that the symbol will always be 4 characters, you should use std::string for it, and modify the >> operator accordingly. 当然,如果你不能保证100%符号永远是4个字符,你应该使用std::string ,并相应地修改>>运算符。

And BTW, you seem to want to read four characters into dcontact , although it's only large enough for three (since >> will insert a terminating '\\0' ). 顺便说一句,你似乎想要将四个字符读入dcontact ,尽管它只有三个字符足够大(因为>>会插入一个终止'\\0' )。 If you read any more than three into it, you have undefined behavior. 如果你读了三个以上,你有不确定的行为。

由于字符串的长度已知,因此您可以使用std::setw(4) ,如

ssline >> std::setw(4) >> dcontact >> delimiter[6];

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM