简体   繁体   English

如何找到第一个唯一的字符列表<string>在 c++ 中?</string>

[英]How to find first unique char a list<string> in c++?

There is a collection (vector, list, etc) of directories:有一个目录集合(向量、列表等):

example 1:示例 1:

/a/ab/bc/de
/a/ab/cc/fw
/a/ab/dd
/a/ab/ee/fg

Find /a/ab查找 /a/ab

example 2:示例 2:

/a/ab/bc/de
/a/b/cc/fw
/a/ab/dd
/a/ab/ee/fg

Find /a找到

What is the best way to find the common path to all the directories?找到所有目录的公共路径的最佳方法是什么?

PS The end goal is to copy only the relative paths, for example 1 the /a/ab needs to be removed so that all is left is: PS 最终目标是仅复制相对路径,例如 1 需要删除 /a/ab 以便剩下的是:

bc/de
cc/fw
dd
ee/fg

This is a first order approach, (too bad I couldn't find any useful functions in <filesystem> )这是一阶方法,(太糟糕了,我在<filesystem>中找不到任何有用的功能)

#include <string>
#include <vector>
#include <iostream>

std::string get_common_path(const std::string& lhs, const std::string& rhs)
{
    auto lhs_it = lhs.begin();
    auto rhs_it = rhs.begin();

    // as long as characters match move to right (but not past end of either string)
    while ((lhs_it != lhs.end()) && (rhs_it != rhs.end()) && (*lhs_it == *rhs_it))
    {
        ++lhs_it;
        ++rhs_it;
    }

    return std::string{ lhs.begin(),lhs_it };
}

std::string common_path(const std::vector<std::string>& values)
{
    if (values.empty()) return std::string{};
    if (values.size() == 1) return values.front();

    // get first string, that is now most common path
    auto it = values.begin();
    std::string retval = *it;
    ++it;
    
    // loop over all values
    while ((it != values.end()) && (!retval.empty()))
    {
        // the overlap is the existing overlap combined with the next string
        // in the vector.
        retval = get_common_path(retval, *it);
        ++it;
    }
    
    return retval;
}


int main()
{
    std::vector<std::string> paths
    {
        "/a/ab/bc/de",
        "/a/ab/cc/fw",
        "/a/ab/dd",
        "/a/ab/ee/fg"
    };

    auto result = common_path(paths);
    std::cout << result;
    
    return 0;
}

Sort vector of paths first.首先对路径向量进行排序。

std::vector<std::string> paths = {"/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd", "/a/ab/ee/fg"};
std::sort(paths.begin(), paths.end());

Compare shortest and longest paths to find the position mismatches.比较最短和最长路径以查找 position 不匹配。

const auto& shortest = paths.front();
const auto& longest = paths.back();
auto mis = std::mismatch(shortest.cbegin(), shortest.cend(), longest.cbegin(), longest.cend());

Now make a copy from the substring.现在从 substring 复制一份。

auto common = std::string(shortest.cbegin(), mis.first);

Here's the full source code tested in vs2022.这是在 vs2022 中测试的完整源代码。 It printed "/a/ab/" and "/a/" for your example.它为您的示例打印了“/a/ab/”和“/a/”。 I beleive you can find how to remove the trailing '/'.我相信您可以找到如何删除尾随的“/”。

#include <algorithm>
#include <iostream>
#include <string>
#include <vector>

int main() {
  try {
    std::vector<std::string> paths = {"/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd",
                                      "/a/ab/ee/fg"};

    std::sort(paths.begin(), paths.end());

    const auto& shortest = paths.front();
    const auto& longest = paths.back();
    auto mis = std::mismatch(shortest.cbegin(), shortest.cend(),
                             longest.cbegin(), longest.cend());

    auto common = std::string(shortest.cbegin(), mis.first);
    std::cout << common << std::endl;
  } catch (const std::exception& e) {
    std::cerr << e.what() << std::endl;
    return -1;
  }

  return 0;
}

Define best and the size of the data set.定义最佳和数据集的大小。 It is a Tree, so you could insert the paths into a tree and then traverse until you find a node with more than one child, this node is the common path for all nodes.它是一棵树,因此您可以将路径插入树中,然后遍历直到找到具有多个子节点的节点,该节点是所有节点的公共路径。

There is a very easy solution.有一个非常简单的解决方案。

You can analyze the data and make the following observation.您可以分析数据并进行以下观察。

If you see the std::vector<std::string>> as a 2-dimenensional array of characters, you can compare the charaters column wise.如果您将std::vector<std::string>>视为二维字符数组,则可以比较字符列。

/a/ab/bc/de
/a/b/cc/fw      
/a/ab/dd
/a/ab/ee/fg
||||
||||
|||+--- Not all charatcers are the same 
||+---- All characters in this column are the same
|+----- All characters in this column are the same
+------ All characters in this column are the same

Starting with column 0, you can check, if all characters in this column are the same, Then next column and so on.从第 0 列开始,您可以检查,如果该列中的所有字符都相同,那么下一列等等。

As soon as we find a difference in a column, then we know that we have found the end of the common prefix.一旦我们在列中找到差异,我们就知道我们已经找到了公共前缀的结尾。

And then we can output the result of the common prefix and also the remaining suffixes.然后我们可以得到公共前缀和剩余后缀的结果 output。

All this with only a few lines of conventional code.所有这一切都只需要几行常规代码。

Example for one potential solution:一种潜在解决方案的示例:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

std::vector<std::string> paths = { "/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd", "/a/ab/ee/fg" };

int main() {
    // Sanity check
    if (not paths.empty()) {

        // Of course we will only compare to the smallest string size
        size_t minSize = std::min_element(paths.begin(), paths.end(), [](const std::string& s1, const std::string& s2) {return s1.size() < s2.size(); })->size();
        size_t cont{ 1 }, col{ 0 };

        // Double nested loop to find resutling column
        for (size_t row{ 1 }; cont and col < minSize; col += cont, row = 1)
            for (auto c{ paths.front()[col] }; cont and row < paths.size(); row += cont)
                cont = ((c == paths[row][col]) * 1);

        // Show result as debug output
        std::cout << "Common prefix: " << paths.front().substr(0, col) << "\n\n";
        for (std::string& s : paths) std::cout << "Resulting path: " << s.substr(col) << '\n';
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM