简体   繁体   中英

Split string path with space

I am writing a program that should receive 3 parameters by User: file_upload "local_path" "remote_path"

code example:

std::vector split(std::string str, char delimiter) {
   std::vector<string> v;
   std::stringstream src(str);
   std::string buf;

   while(getline(src, buf, delimiter)) {
       v.push_back(buf);
   }
   return v;
}

void function() {
   std::string input
   getline(std::cin, input);
   // user input like this: file_upload /home/Space Dir/file c:\dir\file
   std::vector<std::string> v_input = split(input, ' ');

   // the code will do something like this
   if(v_input[0].compare("file_upload") == 0) {        
     FILE *file;
     file = fopen(v_input[1].c_str(), "rb");
     send_upload_dir(v_input[2].c_str());
     // bla bla bla
   }
}

My question is: the second and third parameter are directories, then they can contain spaces in name. How can i make the split function does not change the spaces of the second and third parameter?

I thought to put quotes in directories and make a function to recognize, but not work 100% because the program has other functions that take only 2 parameters not three. can anyone help?

EDIT: /home/user/Space Dir/file.out <-- path with space name.

If this happens the vector size is greater than expected, and the path to the directory will be broken.. this can not happen..

the vector will contain something like this:

vector[1] = /home/user/Space

vector[2] = Dir/file.out

and what I want is this:

vector[1] = /home/user/Space Dir/file.out

I had similar problem few days ago and solve it like this:

First I've created a copy, Then replace the quoted strings in the copy with some padding to avoid white spaces, finally I split the original string according to the white space indexes from the copy.

Here is my full solution:

you may want to also remove the double quotes, trim the original string and so on:

#include <sstream>
#include<iostream>
#include<vector>
#include<string>
using namespace std;


string padString(size_t len, char pad)
{
    ostringstream ostr;
    ostr.fill(pad);
    ostr.width(len);
    ostr<<"";
    return ostr.str();
}
void splitArgs(const string& s, vector<string>& result)
{
    size_t pos1=0,pos2=0,len;
    string res = s;

    pos1 = res.find_first_of("\"");
    while(pos1 != string::npos && pos2 != string::npos){
        pos2 = res.find_first_of("\"",pos1+1);
        if(pos2 != string::npos ){
            len = pos2-pos1+1;
            res.replace(pos1,len,padString(len,'X'));
            pos1 = res.find_first_of("\"");
        }
    }
    pos1=res.find_first_not_of(" \t\r\n",0);
    while(pos1 < s.length() && pos2 < s.length()){
        pos2 = res.find_first_of(" \t\r\n",pos1+1);
        if(pos2 == string::npos ){
            pos2 = res.length();
        }
        len = pos2-pos1;
        result.push_back(s.substr(pos1,len));
        pos1 = res.find_first_not_of(" \t\r\n",pos2+1);
    }
}

int main()
{
    string s = "234 \"5678 91\" 8989"; 
    vector<string> args;
    splitArgs(s,args);
    cout<<"original string:"<<s<<endl;
    for(size_t i=0;i<args.size();i++)
       cout<<"arg "<<i<<": "<<args[i]<<endl;
    return 0;
}

and this is the output:

original string:234 "5678 91" 8989
arg 0: 234
arg 1: "5678 91"
arg 2: 8989

Since you need to accept three values from a single string input, this is a problem of encoding .

Encoding is sometimes done by imposing fixed-width requirements on some or all fields, but that's clearly not appropriate here, since we need to support variable-width file system paths, and the first value (which appears to be some kind of mode specifier) may be variable-width as well. So that's out.

This leaves 4 possible solutions for variable-width encoding:


1: Unambiguous delimiter.

If you can select a separator character that is guaranteed never to show up in the delimited values, then you can split on that. For example, if NUL is guaranteed never to be part of the mode value or the path values, then we can do this:

std::vector<std::string> v_input = split(input,'\0');

Or maybe the pipe character:

std::vector<std::string> v_input = split(input,'|');

Hence the input would have to be given like this (for the pipe character):

file_upload|/home/user/Space Dir/file.out|/home/user/Other Dir/blah

2: Escaping.

You can write the code to iterate through the input line and properly split it on unescaped instances of the separator character. Escaped instances will not be considered separators. You can parameterize the escape character. For example:

std::vector<std::string> escapedSplit(std::string str, char delimiter, char escaper ) {
    std::vector<std::string> res;
    std::string cur;
    for (size_t i = 0; i < str.size(); ++i) {
        if (str[i] == delimiter) {
            res.push_back(cur);
            cur.clear();
        } else if (str[i] == escaper) {
            ++i;
            if (i == str.size()) break;
            cur.push_back(str[i]);
        } else {
            cur.push_back(str[i]);
        } // end if
    } // end for
    if (!cur.empty()) res.push_back(cur);
    return res;
} // end escapedSplit()

std::vector<std::string> v_input = escapedSplit(input,' ','\\');

With input as:

file_upload /home/user/Space\ Dir/file.out /home/user/Other\ Dir/blah

3: Quoting.

You can write the code to iterate through the input line and properly split it on unquoted instances of the separator character. Quoted instances will not be considered separators. You can parameterize the quote character.

A complication of this approach is that it is not possible to include the quote character itself inside a quoted extent unless you introduce an escaping mechanism, similar to solution #2. A common strategy is to allow repetition of the quote character to escape it. For example:

std::vector<std::string> quotedSplit(std::string str, char delimiter, char quoter ) {
    std::vector<std::string> res;
    std::string cur;
    for (size_t i = 0; i < str.size(); ++i) {
        if (str[i] == delimiter) {
            res.push_back(cur);
            cur.clear();
        } else if (str[i] == quoter) {
            ++i;
            for (; i < str.size(); ++i) {
                if (str[i] == quoter) {
                    if (i+1 == str.size() || str[i+1] != quoter) break;
                    ++i;
                    cur.push_back(quoter);
                } else {
                    cur.push_back(str[i]);
                } // end if
            } // end for
        } else {
            cur.push_back(str[i]);
        } // end if
    } // end for
    if (!cur.empty()) res.push_back(cur);
    return res;
} // end quotedSplit()

std::vector<std::string> v_input = quotedSplit(input,' ','"');

With input as:

file_upload "/home/user/Space Dir/file.out" "/home/user/Other Dir/blah"

Or even just:

file_upload /home/user/Space" "Dir/file.out /home/user/Other" "Dir/blah

4: Length-value.

Finally, you can write the code to take a length before each value, and only grab that many characters. We could require a fixed-width length specifier, or skip a delimiting character following the length specifier. For example (note: light on error checking):

std::vector<std::string> lengthedSplit(std::string str) {
    std::vector<std::string> res;
    size_t i = 0;
    while (i < str.size()) {
        size_t len = std::atoi(str.c_str());
        if (len == 0) break;
        i += (size_t)std::log10(len)+2; // +1 to get base-10 digit count, +1 to skip delim
        res.push_back(str.substr(i,len));
        i += len;
    } // end while
    return res;
} // end lengthedSplit()

std::vector<std::string> v_input = lengthedSplit(input);

With input as:

11:file_upload29:/home/user/Space Dir/file.out25:/home/user/Other Dir/blah

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM