简体   繁体   中英

Splitting a string on two delimitators in C++

I have a file, cities.txt, containing:

Hayward - San Lorenzo
San Lorenzo - Oakland
Dublin - San Jose
San Mateo - Hayward
San Francisco - Daly City
San Mateo - Oakland
San Francisco - Oakland
Freemont - Hayward
San Lorenzo - Dublin
San Jose - San Mateo
Daly City - San Raphael

I read the contest of the file with:

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>



int main( ) {
    std::ifstream infile( "cities.txt" ) ;
    if ( infile ) {
        std::string fileData( ( std::istreambuf_iterator<char> ( infile ) ) ,
        std::istreambuf_iterator<char> ( ) ) ;
        infile.close( );
        std::cout << fileData <<"\n\n";
        return 0 ;
   }
   else {
      std::cout << "Where is cities.txt?\n" ;
      return 1 ;
   }
}

and save the contents in the fileData string. I need to break that string into a list of strings that only contain the names of the cities. Something like this:

list = {"Hayward","San Lorenzo", "San Lorenzo", "Oakland"......}

I was going to turn the string into char* and use strtok but it seems lika a lot of work for something that can probably be done using standard string functions. Is there a way to do it that is both fast and terse?

I would probably use std::getline , specifying - as the separator between elements:

std::string city;
while (std::getline(i, city, '-'))
    cities.push_back(city);

One minor detail: this will leave white-space intact, so if leading and/or trailing white-space is a problem, you'll have to trim it separately.

You can do this in couple of steps.

  1. Split content of the file into vector of strings - so, each element of your vector will contain single row of the file

  2. Split each row of the file into two elements (two cities in the row)

  3. Trim content

split function can be implemented like this:

vector<string> split (string str, string seq) { 
    vector<string> ret {};
    size_t pos {};

    while ((pos = str.find (seq)) != string::npos) { 
        ret.push_back (str.substr (0, pos));
        str = str.substr (pos+seq.size ()); 
    }
    ret.push_back (str);

    return ret;
}

Trimming functions can be implemented like this:

string ltrim (string s) { 
    s.erase (s.begin (), find_if (s.begin (), s.end (), not1 (ptr_fun<int, int> (isspace))));
    return s;
}

string rtrim (string s) { 
    s.erase (find_if (s.rbegin (), s.rend (), not1 (ptr_fun<int, int> (isspace))).base (), s.end ());
    return s;
}

string trim (string s) { 
    return ltrim (rtrim (s));
}

So, basically you have all you need, let's get prepare a result function.

vector<string> result (vector<string>&& content) {
    vector<string> ret {};
    for (const auto& c : content) { 
        auto vec = split (c, "-"); // (2)
        for (const auto& v : vec) { 
            ret.push_back (trim (v));
        }

    }
    return ret;
}

void show (const vector<string>& vec) { 
    for (const auto& v : vec) { 
        cout << "|" << v << "|" << endl;
    }
}

and usage looks like this, assuming that content of your file is in the content object.

auto vec = result (split (content, "\n")); // (1)
show (vec);

Now, some explanation is needed. Let's take a look at the (1) we take a whole content of the file (I missed retrieving content from the file) and create a vector of strings and in this case it is vector of rows (from the file, because seq uence is "\\n"). So, we pass to the result function vector of rows from the file. Ok, simple, let's go ahead. Now we have to split this row into two strings (cities) (2) , but our seq uence is now "-". This (2) call will produce vector of strings, which will contain name of the cities. Now, all we have to do is to add these names to the vector ret which will be returned, but firstly trimming content to make all white spaces from left and right side go away.

The result is:

|Hayward|
|San Lorenzo|
|San Lorenzo|
|Oakland|
|Dublin|
|San Jose|
|San Mateo|
|Hayward|
|San Francisco|
|Daly City|
|San Mateo|
|Oakland|
|San Francisco|
|Oakland|
|Freemont|
|Hayward|
|San Lorenzo|
|Dublin|
|San Jose|
|San Mateo|
|Daly City|
|San Raphael|

You can work with string::find, string::erase and string::substr

Use a while loop with something like found = input.find("-"); while(found != string::npos){... } found = input.find("-"); while(found != string::npos){... }

In the while Substring to the city names and erase the city afterwards from the whole string with .erase(position, length)

You may use boost regex_split. I have modified your code to demonstrate the same. Pasted below:

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <boost/regex.hpp>
#include <vector>



int main( ) {
    std::ifstream infile( "cities.txt" ) ;
    if ( infile ) {
        std::string fileData( ( std::istreambuf_iterator<char> ( infile ) ) ,
        std::istreambuf_iterator<char> ( ) ) ;
        infile.close( );
        std::cout << fileData <<"\n\n";
        std::vector<std::string> out;

        // Delimeter regular expression
        boost::regex delims("\\s+-\\s+|\n|\r");

        boost::regex_split(std::back_inserter(out), fileData, delims);
        for (auto &city : out) {
            std::cout << city << std::endl;
        }
   }
   else {
      std::cout << "Where is cities.txt?\n" ;
      return 1 ;
   }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM