簡體   English   中英

如何在不復制的情況下比較字符串的一部分?

[英]How do I compare a section of a string without copying?

我有一個我正在迭代的長字符串,並且在每次迭代時我將字符串的一部分與常量進行比較並存儲字符串的某些部分。 在我的實際代碼中,此代碼運行數百萬次,是主要的瓶頸。 我認為這是由於過度使用std::string::substr

#include <iostream>
#include <map>
#include <string>
#include <vector>

int main() {
    std::string str("0=My,1=comma,2=separated,3=string,0=with,3=repeated,7=IDs");
    std::vector<std::string> out0;
    std::map<std::string, std::string> out;

    size_t pos = str.find(',');

    // loop over the string, collecting "key=value" pairs
    while (pos < str.size() - 1) {
        if (str.substr(pos + 1, 2) == "0=") {
            auto newPos = str.find(',', pos + 3);
            out0.push_back(str.substr(pos + 3, newPos - pos - 3);
            pos = newPos;
        } else {
            size_t eqPos = str.find('=', pos + 1);
            auto newPos = str.find(',', eqPos + 1);
            out[str.substr(pos + 1, eqPos - pos - 1)] = str.substr(eqPos + 1, newPos - eqPos - 1);
        }
    }

    // print out the data structures (this doesn't happen in my actual code)
    std::cout << "out0:";
    for (auto& entry : out0) {
        std::cout << ' ' << entry;
    }
    std::cout << std::endl;

    std::cout << "out:";
    for (auto it : out) {
        std::cout << ' ' << it->first << '=' << it->second;
    }
}

這是我的問題:

  • 如何在不執行復制的情況下對字符串進行比較,而不為每個字符編寫比較,例如str[pos + 1] == '0' && str[pos + 2] == '=' && ...
  • 如何存儲對子串的引用,而不是每次添加到out0out都復制?

這可能是使用char *一個很好的例子,但我以前從未使用過它。

編輯:

不幸的是,我只有C ++ 11; 否則, std::string_view是最好的答案。 有沒有辦法在沒有std::string_view情況下完成引用的存儲?

如果您有C ++ 17,則可以使用string_view :(未經測試的代碼):

string_view sv{str.data() + pos, 2};
if (sv == "0=") ...

沒有副本。 甚至(一氣呵成):

if (string_view{str.data() + pos, 2} == "0=") ...

如果您沒有string_view ,則可以使用char_traits

if (std::char_traits<char>::compare(str.data() + pos, "0=", 2) == 0) ...

使用std::string_view而不是std::string作為out的鍵和值。 std::string_view包含一個指向字符串的指針,以及字符串的大小,因此它的重量非常輕。 這使您可以提取所需的信息,但無需復制字符串中的任何字符以及創建這些字符串的任何潛在內存分配。

您需要做的是從std::string獲取string_view ,然后使用該string_view獲取所需的所有子字符串。

由於人們發布了std :: string_view,這里是普通的舊C指針版本。

(雖然沒有測試,但它會給你這個想法)

見下文:

std::string str("0=My,1=comma,2=separated,3=string,0=with,3=repeated,7=IDs");
std::string substr("test");
.
. Inside some function
.
const char *str_p = str.c_str();        // String you want to compare with a substring
const char *substr_p = substr.c_str();  // Your substring
size_t str_len = str.length();
size_t substr_len = substr.length();
bool comparison_result = true;
for(size_t i = 0; i < str_len - substr_len; i++) {
    for(size_t j = 0; j < substr_len; j++) {
        if(*(str_p + i + j) != *(substr_p + j)) {
            comparison_result = false;
            break;
        }
        if (j == substr_len - 1) { // We can only reach here when substring is hit
            comparison_result = true;
            i = str_len - substr_len;
            break;
        }
    }
}
return comparison_result;

編輯:

由於@Toby Speight在評論中的建議(我覺得非常好),我也正在實現一個std :: memcmp()版本。 在這種情況下,內部循環變為:

.
. Inside some function
.
const char *str_p = str.c_str();        // String you want to compare with a substring
const char *substr_p = substr.c_str();  // Your substring
size_t str_len = str.length();
size_t substr_len = substr.length();
bool comparison_result = false;
for(size_t i = 0; i < str_len - substr_len; i++) {
    if(std::memcmp(str_p + i, substr_p, substr_len) == 0) {
        comparison_result = true;
        break;
    }
}
return comparison_result;

編輯:

我們收到了另一個請求,這次來自@Alexander Zhang,讓我們實現它:

.
. Inside some function
.
const char *str_p = str.c_str();        // String you want to compare with a substring
const char *substr_p = substr.c_str();  // Your substring
size_t str_len = str.length();
size_t substr_len = substr.length();
bool comparison_result = false;
for(size_t i = 0; i < str_len - substr_len; i++) {
    if(std::memcmp(&str_p[i], &substr_p[0], substr_len) == 0) {
        comparison_result = true;
        break;
    }
}
return comparison_result;

std::stringcompare()方法,它們將const char* substring作為輸入。 您不需要使用std::string::substr()來比較子字符串,例如:

#include <iostream>
#include <map>
#include <string>
#include <vector>

int main() {
    std::string str("0=My,1=comma,2=separated,3=string,0=with,3=repeated,7=IDs");
    std::vector<std::string> out0;
    std::map<std::string, std::string> out;

    size_t startPos = 0, delimPos, nameStart, nameEnd, valueStart, valueEnd;

    // loop over the string, collecting "key=value" pairs
    while (startPos < str.size()){
        nameStart = startPos;
        delimPos = str.find_first_of("=,", startPos, 2);
        if (delimPos == std::string::npos) {
            nameEnd = valueStart = valueEnd = str.size();
        }
        else {
            nameEnd = delimPos;
            if (str[delimPos] == '=') {
                valueStart = nameEnd + 1;
                valueEnd = str.find(',', valueStart);
                if (valueEnd == std::string::npos) {
                    valueEnd = str.size();
                }
            }
            else {
                valueStart = valueEnd = nameEnd;
            }
        }

        // TODO: if needed, adjust name(Start|End) and value(Start|End) to
        // ignore leading/trailing whitespace around the name and value
        // substrings...

        if (str.compare(nameStart, nameEnd - nameStart, "0", 1) == 0) {
            out0.push_back(str.substr(valueStart, valueEnd - valueStart));
        } else {
            out[str.substr(nameStart, nameEnd - nameStart)] = str.substr(valueStart, valueEnd - valueStart);
        }

        startPos = valueEnd + 1;
    }

    // print out the data structures
    std::cout << "out0:";
    for (auto& entry : out0) {
        std::cout << ' ' << entry;
    }
    std::cout << std::endl;

    std::cout << "out:";
    for (auto it : out) {
        std::cout << ' ' << it->first << '=' << it->second;
    }
}

輸出:

out0: My with
out: 1=comma 2=separated 3=repeated 7=IDs

現場演示

您可以更進一步,通過不在std::vectorstd::map中存儲std::string值來完全消除substr()的使用,而是存儲std::pair<char*, size_t>

#include <iostream>
#include <map>
#include <string>
#include <vector>
#include <utility>

using StrView = std::pair<const char*, size_t>;

StrView makeStrView(const char *str, size_t size) {
    return std::make_pair(str, size);
}

struct compareStrView {
    bool operator()(const StrView &lhs, const StrView &rhs) const {
        if (lhs.second == rhs.second)
            return (std::char_traits<char>::compare(lhs.first, rhs.first, lhs.second) < 0);
        return (lhs.second < rhs.second);
    }
};

std::ostream& operator<<(std::ostream &os, const StrView &rhs) {
    return os.write(rhs.first, rhs.second);
}

int main() {
    std::string str("0=My,1=comma,2=separated,3=string,0=with,3=repeated,7=IDs");
    std::vector<StrView> out0;
    std::map<StrView, StrView, compareStrView> out;

    size_t startPos = 0, delimPos, nameStart, nameEnd, valueStart, valueEnd;

    // loop over the string, collecting "key=value" pairs
    while (startPos < str.size()){
        nameStart = startPos;
        delimPos = str.find_first_of("=,", startPos, 2);
        if (delimPos == std::string::npos) {
            nameEnd = valueStart = valueEnd = str.size();
        }
        else {
            nameEnd = delimPos;
            if (str[delimPos] == '=') {
                valueStart = nameEnd + 1;
                valueEnd = str.find(',', valueStart);
                if (valueEnd == std::string::npos) {
                    valueEnd = str.size();
                }
            }
            else {
                valueStart = valueEnd = nameEnd;
            }
        }

        // TODO: if needed, adjust nameStart/End and valueStartEnd to
        // ignore leading/trailing whitespace around the name and value
        // substrings...

        if (str.compare(nameStart, nameEnd - nameStart, "0", 1) == 0) {
            out0.push_back(makeStrView(&str[valueStart], valueEnd - valueStart));
        } else {
            out[makeStrView(&str[nameStart], nameEnd - nameStart)] = makeStrView(&str[valueStart], valueEnd - valueStart);
        }

        startPos = valueEnd + 1;
    }

    // print out the data structures
    std::cout << "out0:";
    for (auto& entry : out0) {
        std::cout << ' ' << entry;
    }
    std::cout << std::endl;

    std::cout << "out:";
    for (auto &it : out) {
        std::cout << ' ' << it.first << '=' << it.second;
    }
}

輸出:

out0: My with
out: 1=comma 2=separated 3=repeated 7=IDs

現場演示

在C ++ 17中,您可以使用std::string_view代替:

#include <iostream>
#include <map>
#include <string>
#include <vector>
#include <string_view>

int main() {
    std::string str("0=My,1=comma,2=separated,3=string,0=with,3=repeated,7=IDs");
    std::string_view sv(str);
    std::vector<std::string_view> out0;
    std::map<std::string_view, std::string_view> out;

    size_t startPos = 0, delimPos, nameStart, nameEnd, valueStart, valueEnd;

    // loop over the string, collecting "key=value" pairs
    while (startPos < sv.size()){
        nameStart = startPos;
        delimPos = sv.find_first_of("=,", startPos, 2);
        if (delimPos == std::string_view::npos) {
            nameEnd = valueStart = valueEnd = sv.size();
        }
        else {
            nameEnd = delimPos;
            if (sv[delimPos] == '=') {
                valueStart = nameEnd + 1;
                valueEnd = sv.find(',', valueStart);
                if (valueEnd == std::string_view::npos) {
                    valueEnd = sv.size();
                }
            }
            else {
                valueStart = valueEnd = nameEnd;
            }
        }

        // TODO: if needed, adjust nameStart/End and valueStartEnd to
        // ignore leading/trailing whitespace around the name and value
        // substrings...

        if (sv.compare(nameStart, nameEnd - nameStart, "0", 1) == 0) {
            out0.push_back(sv.substr(valueStart, valueEnd - valueStart));
        } else {
            out[sv.substr(nameStart, nameEnd - nameStart)] = sv.substr(valueStart, valueEnd - valueStart);
        }

        startPos = valueEnd + 1;
    }

    // print out the data structures
    std::cout << "out0:";
    for (auto& entry : out0) {
        std::cout << ' ' << entry;
    }
    std::cout << std::endl;

    std::cout << "out:";
    for (auto &it : out) {
        std::cout << ' ' << it.first << '=' << it.second;
    }
}

您可以嘗試使用Regex來拆分值對元組。

雖然沒有測試是否更快

這個表達式可以解決問題,只需獲得所有匹配(所有對)

(?:(\\ d)+ =(?:([^,] *))?)*?

https://regex101.com/r/PDZMq0/1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM