简体   繁体   English

您能否推荐一下,如何重新实现拆分 function 以使用 string_view?

[英]Could you recommend, how reimplement split function to work with string_view?

I write this split function, can't find easy way to split by string_view(several chars).我写了这个拆分 function,找不到简单的方法来拆分 string_view(几个字符)。 My function:我的 function:

size_t split(std::vector<std::string_view>& result, std::string_view in, char sep) {
    result.reserve(std::count(in.begin(), in.end(), in.find(sep) != std::string::npos) + 1);
    for (auto pfirst = in.begin();; ++pfirst) {
        auto pbefore = pfirst;
        pfirst = std::find(pfirst, in.end(), sep);
        result.emplace_back(q, pfirst-pbefore);
        if (pfirst == in.end())
            return result.size();
    }
}

I want to call this split function with string_view separator.我想用 string_view 分隔符调用这个拆分 function。 For example:例如:

str = "apple, phone, bread\n keyboard, computer"
split(result, str, "\n,")
Result:['apple', 'phone', 'bread', 'keyboard', 'computer']

My question is, how can i implement this function as fast as possible?我的问题是,我怎样才能尽快实现这个 function?

First, you are using std::count() incorrectly.首先,您错误地使用std::count()

Second, std::string_view has its own find_first_of() and substr() methods, which you can use in this situation, instead of using iterators.其次, std::string_view有自己的find_first_of()substr()方法,您可以在这种情况下使用它们,而不是使用迭代器。 find_first_of() allows you to specify multiple characters to search for. find_first_of()允许您指定要搜索的多个字符。

Try something more like this:尝试更多类似的东西:

size_t split(std::vector<std::string_view>& result, std::string_view in, std::string_view seps) {
    result.reserve(std::count_if(in.begin(), in.end(), [&](char ch){ return seps.find(ch) != std::string_view::npos; }) + 1);
    std::string_view::size_type start = 0, end;
    while ((end = in.find_first_of(seps, start)) != std::string_view::npos) {
        result.push_back(in.substr(start, end-start));
        start = in.find_first_not_of(' ', end+1);
    }
    if (start != std::string_view::npos)
        result.push_back(in.substr(start));
    return result.size();
}

Online Demo在线演示

This is my take on splitting a string view, just loops once over all the characters in the string view and returns a vector of string_views (so no copying of data) The calling code can still use words.size() to get the size if needed.这是我对拆分字符串视图的看法,只需对字符串视图中的所有字符循环一次并返回 string_views 的向量(因此不复制数据)调用代码仍然可以使用 words.size() 来获取大小如果需要。 (I use C++20 std::set contains function) (我使用 C++20 std::set 包含函数)

Live demo here: https://onlinegdb.com/tHfPIeo1iM现场演示: https://onlinegdb.com/tHfPIeo1iM

#include <iostream>
#include <set>
#include <string_view>
#include <vector>

auto split(const std::string_view& string, const std::set<char>& separators)
{
    std::vector<std::string_view> words;
    auto word_begin{ string.data() };
    std::size_t word_len{ 0ul };

    for (const auto& c : string)
    {
        if (!separators.contains(c))
        {
            word_len++;
        }
        else
        {
            // we found a word and not a seperator repeat
            if (word_len > 0)
            {
                words.emplace_back(word_begin, word_len);
                word_begin += word_len;
                word_len = 0;
            }

            word_begin++;
        }
    }
    
    // string_view doesn't have a trailing zero so
    // also no trailing separator so if there is still
    // a word in the "pipeline" add it too
    if (word_len > 0)
    {
        words.emplace_back(word_begin, word_len);
    }

    return words;
}

int main()
{
    std::set<char> seperators{ ' ', ',', '.', '!', '\n' };
    auto words = split("apple, phone, bread\n keyboard, computer", seperators);
    
    bool comma = false;
    std::cout << "[";
    for (const auto& word : words)
    {
        if (comma) std::cout << ", ";
        std::cout << word;
        comma = true;
    }
    std::cout << "]\n";

    return 0;
}

I do not know about performance, but this code seems a lot simpler我不知道性能,但这段代码似乎要简单得多

    std::vector<std::string> ParseDelimited(
        const std::string &l, char delim )
    {
        std::vector<std::string> token;
        std::stringstream sst(l);
        std::string a;
        while (getline(sst, a, delim))
            token.push_back(a);
        return token;
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM