Using C++, I would like to split the rows of a string (CSV file in this case) where some of the fields may contain delimiters that are escaped (using "") and should be seen as literals. I have looked at the various questions already posed by have not found a direct answer to my problem.
Example of CSV file data:
Header1,Header2,Header3,Header4,Header5
Hello,",,,","world","!,,!,",","
Desired string vector after splitting:
["Hello"],[",,,"],["world"],["!,,!,"],[","]
Note: The CSV is only valid if the number of data columns equal the number of header columns.
Would prefer a non-boost / third-party solution. Efficiency is not a priority.
EDIT: Code below implementing regex from @ClasG at least satisfies the scenario above. I am drafting fringe test cases but would love to hear when / where it breaks down...
std::string s = "Hello,\",,,\",\"world\",\"!,,!,\",\",\"\"";
std::string rx_string = "(\"[^\"]*\"|[^,]*)(?:,|$)";
regex e(rx_string);
std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
std::regex_iterator<std::string::iterator> rend;
while (rit!=rend)
{
std::cout << rit->str() << std::endl;
++rit;
}
This is not a complete (c++) solution, but a regex that might nudge you in the right direction.
A regex like
("[^"]*"|[^,]*)(?:,|$)
will match the individual columns. (Note that it doesn't handle escaped quotes.)
This is not an answer, but it's too long to put as a comment IMHO.
CSV is one of those seemingly-simple-but-actually-quite-fiendish storage formats.
The droid you're looking for is Boost.Spirit.
The Spirit Master's name (on stack overflow) is @sehe.
See his answer here: https://stackoverflow.com/a/18366335/2015579
Please credit sehe, not me.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.