简体   繁体   中英

C++ String splitting but escaping all delimiters in quotations

Using C++, I would like to split the rows of a string (CSV file in this case) where some of the fields may contain delimiters that are escaped (using "") and should be seen as literals. I have looked at the various questions already posed by have not found a direct answer to my problem.

Example of CSV file data:

Header1,Header2,Header3,Header4,Header5
Hello,",,,","world","!,,!,",","

Desired string vector after splitting:

["Hello"],[",,,"],["world"],["!,,!,"],[","]

Note: The CSV is only valid if the number of data columns equal the number of header columns.

Would prefer a non-boost / third-party solution. Efficiency is not a priority.

EDIT: Code below implementing regex from @ClasG at least satisfies the scenario above. I am drafting fringe test cases but would love to hear when / where it breaks down...

std::string s = "Hello,\",,,\",\"world\",\"!,,!,\",\",\"\"";    
std::string rx_string = "(\"[^\"]*\"|[^,]*)(?:,|$)";
regex e(rx_string);
std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
std::regex_iterator<std::string::iterator> rend;

while (rit!=rend) 
{
    std::cout << rit->str() << std::endl;
    ++rit;
}    

This is not a complete (c++) solution, but a regex that might nudge you in the right direction.

A regex like

("[^"]*"|[^,]*)(?:,|$)

will match the individual columns. (Note that it doesn't handle escaped quotes.)

See it here at regex101 .

This is not an answer, but it's too long to put as a comment IMHO.

CSV is one of those seemingly-simple-but-actually-quite-fiendish storage formats.

The droid you're looking for is Boost.Spirit.

The Spirit Master's name (on stack overflow) is @sehe.

See his answer here: https://stackoverflow.com/a/18366335/2015579

Please credit sehe, not me.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM