如何找到第一個唯一的字符列表<string>在 c++ 中？</string>

Question

有一個目錄集合（向量、列表等）：

示例 1：

/a/ab/bc/de
/a/ab/cc/fw
/a/ab/dd
/a/ab/ee/fg

查找 /a/ab

示例 2：

/a/ab/bc/de
/a/b/cc/fw
/a/ab/dd
/a/ab/ee/fg

找到

找到所有目錄的公共路徑的最佳方法是什么？

PS 最終目標是僅復制相對路徑，例如 1 需要刪除 /a/ab 以便剩下的是：

bc/de
cc/fw
dd
ee/fg

Answer 1

這是一階方法，（太糟糕了，我在<filesystem>中找不到任何有用的功能）

#include <string>
#include <vector>
#include <iostream>

std::string get_common_path(const std::string& lhs, const std::string& rhs)
{
    auto lhs_it = lhs.begin();
    auto rhs_it = rhs.begin();

    // as long as characters match move to right (but not past end of either string)
    while ((lhs_it != lhs.end()) && (rhs_it != rhs.end()) && (*lhs_it == *rhs_it))
    {
        ++lhs_it;
        ++rhs_it;
    }

    return std::string{ lhs.begin(),lhs_it };
}

std::string common_path(const std::vector<std::string>& values)
{
    if (values.empty()) return std::string{};
    if (values.size() == 1) return values.front();

    // get first string, that is now most common path
    auto it = values.begin();
    std::string retval = *it;
    ++it;
    
    // loop over all values
    while ((it != values.end()) && (!retval.empty()))
    {
        // the overlap is the existing overlap combined with the next string
        // in the vector.
        retval = get_common_path(retval, *it);
        ++it;
    }
    
    return retval;
}


int main()
{
    std::vector<std::string> paths
    {
        "/a/ab/bc/de",
        "/a/ab/cc/fw",
        "/a/ab/dd",
        "/a/ab/ee/fg"
    };

    auto result = common_path(paths);
    std::cout << result;
    
    return 0;
}

Answer 2

首先對路徑向量進行排序。

std::vector<std::string> paths = {"/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd", "/a/ab/ee/fg"};
std::sort(paths.begin(), paths.end());

比較最短和最長路徑以查找 position 不匹配。

const auto& shortest = paths.front();
const auto& longest = paths.back();
auto mis = std::mismatch(shortest.cbegin(), shortest.cend(), longest.cbegin(), longest.cend());

現在從 substring 復制一份。

auto common = std::string(shortest.cbegin(), mis.first);

這是在 vs2022 中測試的完整源代碼。 它為您的示例打印了“/a/ab/”和“/a/”。 我相信您可以找到如何刪除尾隨的“/”。

#include <algorithm>
#include <iostream>
#include <string>
#include <vector>

int main() {
  try {
    std::vector<std::string> paths = {"/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd",
                                      "/a/ab/ee/fg"};

    std::sort(paths.begin(), paths.end());

    const auto& shortest = paths.front();
    const auto& longest = paths.back();
    auto mis = std::mismatch(shortest.cbegin(), shortest.cend(),
                             longest.cbegin(), longest.cend());

    auto common = std::string(shortest.cbegin(), mis.first);
    std::cout << common << std::endl;
  } catch (const std::exception& e) {
    std::cerr << e.what() << std::endl;
    return -1;
  }

  return 0;
}

Answer 3

定義最佳和數據集的大小。 它是一棵樹，因此您可以將路徑插入樹中，然后遍歷直到找到具有多個子節點的節點，該節點是所有節點的公共路徑。

Answer 4

有一個非常簡單的解決方案。

您可以分析數據並進行以下觀察。

如果您將std::vector<std::string>>視為二維字符數組，則可以比較字符列。

/a/ab/bc/de
/a/b/cc/fw      
/a/ab/dd
/a/ab/ee/fg
||||
||||
|||+--- Not all charatcers are the same 
||+---- All characters in this column are the same
|+----- All characters in this column are the same
+------ All characters in this column are the same

從第 0 列開始，您可以檢查，如果該列中的所有字符都相同，那么下一列等等。

一旦我們在列中找到差異，我們就知道我們已經找到了公共前綴的結尾。

然后我們可以得到公共前綴和剩余后綴的結果 output。

所有這一切都只需要幾行常規代碼。

一種潛在解決方案的示例：

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

std::vector<std::string> paths = { "/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd", "/a/ab/ee/fg" };

int main() {
    // Sanity check
    if (not paths.empty()) {

        // Of course we will only compare to the smallest string size
        size_t minSize = std::min_element(paths.begin(), paths.end(), [](const std::string& s1, const std::string& s2) {return s1.size() < s2.size(); })->size();
        size_t cont{ 1 }, col{ 0 };

        // Double nested loop to find resutling column
        for (size_t row{ 1 }; cont and col < minSize; col += cont, row = 1)
            for (auto c{ paths.front()[col] }; cont and row < paths.size(); row += cont)
                cont = ((c == paths[row][col]) * 1);

        // Show result as debug output
        std::cout << "Common prefix: " << paths.front().substr(0, col) << "\n\n";
        for (std::string& s : paths) std::cout << "Resulting path: " << s.substr(col) << '\n';
    }
}

如何找到第一個唯一的字符列表<string>在 c++ 中？</string>

問題描述

4 個解決方案

解決方案1
1 2022-07-31 05:41:24

解決方案2
1 2022-07-31 06:02:28

解決方案3
0 2022-07-31 04:32:57

解決方案4
0 2022-07-31 07:49:06

如何找到第一個唯一的字符列表<string>在 c++ 中？</string>

問題描述

4 個解決方案

解決方案1 1 2022-07-31 05:41:24

解決方案2 1 2022-07-31 06:02:28

解決方案3 0 2022-07-31 04:32:57

解決方案4 0 2022-07-31 07:49:06

解決方案1
1 2022-07-31 05:41:24

解決方案2
1 2022-07-31 06:02:28

解決方案3
0 2022-07-31 04:32:57

解決方案4
0 2022-07-31 07:49:06