简体   繁体   English

C ++-在通过tokenizer读取文件行时删除或跳过引号字符

[英]C++ - Remove or skip quote char in reading a file line by tokenizer

I have a csv file that has records like: 我有一个csv文件,其记录如下:

837478739*"EP" 1 "3FB2B464BD5003B55CA6065E8E040A2A"*"F"*21*15*"NH"*"N" 0 *-1*"-1"*0*0**-1*223944*-1*"23" 1 "-1" "-1" "78909" "-1" "-1" "-1" "-1" "-1" "-1" "-1" "-1" "-1" "-1" "-1" "-1" "-1" "74425" "26" "-1"*"-1"*1*1*69*23.58*0*0*0*0*"MC" 837478739 *“ EP” 1 “ 3FB2B464BD5003B55CA6065E8E040A2A” *“ F” * 21 * 15 *“ NH” *“ N” 0 * -1 *“-1” * 0 * 0 **-1 * 223944 * -1 *“ 23 “ 1 ” -1“ ” -1“ ” 78909“ ” -1“ ” -1“ ” -1“ ” -1“ ” -1“ ” -1“ ” -1“ ” -1“ ” -1“ ” -1“ ” -1“ ” -1“ ” -1“ ” 74425“ ” 26“ ” -1“ *”-1“ * 1 * 1 * 69 * 23.58 * 0 * 0 * 0 * 0 *” MC“

The file has lots of records, so I need a fast method to breakdown the line and push_back each of those parts into a vector. 该文件有很多记录,因此我需要一种快速的方法来分解行并将这些部分中的每个部分推回向量中。 The main reason I choose tokenizer is that I heard a lot about its performance. 我选择令牌生成器的主要原因是我听到了很多有关其性能的信息。 I have a function: 我有一个功能:

void break(){
   //using namespace boost;
   string s = "This is a , test '' file";
   boost::tokenizer<> tok(s);
   vector<string> line;
   for(boost::tokenizer<>::iterator beg=tok.begin();beg!=tok.end();++beg){
       line.push_back(*beg);
   }
   cout << line[3] << "  and  " << line[5] << endl;
}

By that I can get each part of the sentence and ignore everything that is not a letter. 这样,我可以得到句子的每个部分,而忽略所有不是字母的部分。 Does the tokenizer have the ability to read the record that I have and parse them by "*" delimiter and remove the quotes from the string? 分词器是否有能力读取我拥有的记录并通过“ *”定界符进行解析并从字符串中删除引号? There won't be any kind of special character between quotes, I just need to remove the quote marks. 引号之间不会有任何特殊字符,我只需要删除引号即可。 I tried to read the tokenizer document, but nothing came out. 我试图阅读令牌生成器文档,但没有任何结果。

You can use regex_replace . 您可以使用regex_replace

"break" is keyword. “ break”是关键词。 You shouldn't use it for function name. 您不应将其用作函数名称。

You need to assign another TokenizerFunc to your Tokenizer to parse the string differently, the default parses on space and punctuation 您需要将另一个TokenizerFunc分配给Tokenizer来以不同的方式解析字符串,默认情况下是解析空格和标点符号

http://www.boost.org/doc/libs/1_37_0/libs/tokenizer/tokenizerfunction.htm http://www.boost.org/doc/libs/1_37_0/libs/tokenizer/tokenizerfunction.htm

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM