简体   繁体   中英

How to find first unique char a list<string> in c++?

There is a collection (vector, list, etc) of directories:

example 1:

/a/ab/bc/de
/a/ab/cc/fw
/a/ab/dd
/a/ab/ee/fg

Find /a/ab

example 2:

/a/ab/bc/de
/a/b/cc/fw
/a/ab/dd
/a/ab/ee/fg

Find /a

What is the best way to find the common path to all the directories?

PS The end goal is to copy only the relative paths, for example 1 the /a/ab needs to be removed so that all is left is:

bc/de
cc/fw
dd
ee/fg

This is a first order approach, (too bad I couldn't find any useful functions in <filesystem> )

#include <string>
#include <vector>
#include <iostream>

std::string get_common_path(const std::string& lhs, const std::string& rhs)
{
    auto lhs_it = lhs.begin();
    auto rhs_it = rhs.begin();

    // as long as characters match move to right (but not past end of either string)
    while ((lhs_it != lhs.end()) && (rhs_it != rhs.end()) && (*lhs_it == *rhs_it))
    {
        ++lhs_it;
        ++rhs_it;
    }

    return std::string{ lhs.begin(),lhs_it };
}

std::string common_path(const std::vector<std::string>& values)
{
    if (values.empty()) return std::string{};
    if (values.size() == 1) return values.front();

    // get first string, that is now most common path
    auto it = values.begin();
    std::string retval = *it;
    ++it;
    
    // loop over all values
    while ((it != values.end()) && (!retval.empty()))
    {
        // the overlap is the existing overlap combined with the next string
        // in the vector.
        retval = get_common_path(retval, *it);
        ++it;
    }
    
    return retval;
}


int main()
{
    std::vector<std::string> paths
    {
        "/a/ab/bc/de",
        "/a/ab/cc/fw",
        "/a/ab/dd",
        "/a/ab/ee/fg"
    };

    auto result = common_path(paths);
    std::cout << result;
    
    return 0;
}

Sort vector of paths first.

std::vector<std::string> paths = {"/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd", "/a/ab/ee/fg"};
std::sort(paths.begin(), paths.end());

Compare shortest and longest paths to find the position mismatches.

const auto& shortest = paths.front();
const auto& longest = paths.back();
auto mis = std::mismatch(shortest.cbegin(), shortest.cend(), longest.cbegin(), longest.cend());

Now make a copy from the substring.

auto common = std::string(shortest.cbegin(), mis.first);

Here's the full source code tested in vs2022. It printed "/a/ab/" and "/a/" for your example. I beleive you can find how to remove the trailing '/'.

#include <algorithm>
#include <iostream>
#include <string>
#include <vector>

int main() {
  try {
    std::vector<std::string> paths = {"/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd",
                                      "/a/ab/ee/fg"};

    std::sort(paths.begin(), paths.end());

    const auto& shortest = paths.front();
    const auto& longest = paths.back();
    auto mis = std::mismatch(shortest.cbegin(), shortest.cend(),
                             longest.cbegin(), longest.cend());

    auto common = std::string(shortest.cbegin(), mis.first);
    std::cout << common << std::endl;
  } catch (const std::exception& e) {
    std::cerr << e.what() << std::endl;
    return -1;
  }

  return 0;
}

Define best and the size of the data set. It is a Tree, so you could insert the paths into a tree and then traverse until you find a node with more than one child, this node is the common path for all nodes.

There is a very easy solution.

You can analyze the data and make the following observation.

If you see the std::vector<std::string>> as a 2-dimenensional array of characters, you can compare the charaters column wise.

/a/ab/bc/de
/a/b/cc/fw      
/a/ab/dd
/a/ab/ee/fg
||||
||||
|||+--- Not all charatcers are the same 
||+---- All characters in this column are the same
|+----- All characters in this column are the same
+------ All characters in this column are the same

Starting with column 0, you can check, if all characters in this column are the same, Then next column and so on.

As soon as we find a difference in a column, then we know that we have found the end of the common prefix.

And then we can output the result of the common prefix and also the remaining suffixes.

All this with only a few lines of conventional code.

Example for one potential solution:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

std::vector<std::string> paths = { "/a/ab/bc/de", "/a/b/cc/fw", "/a/ab/dd", "/a/ab/ee/fg" };

int main() {
    // Sanity check
    if (not paths.empty()) {

        // Of course we will only compare to the smallest string size
        size_t minSize = std::min_element(paths.begin(), paths.end(), [](const std::string& s1, const std::string& s2) {return s1.size() < s2.size(); })->size();
        size_t cont{ 1 }, col{ 0 };

        // Double nested loop to find resutling column
        for (size_t row{ 1 }; cont and col < minSize; col += cont, row = 1)
            for (auto c{ paths.front()[col] }; cont and row < paths.size(); row += cont)
                cont = ((c == paths[row][col]) * 1);

        // Show result as debug output
        std::cout << "Common prefix: " << paths.front().substr(0, col) << "\n\n";
        for (std::string& s : paths) std::cout << "Resulting path: " << s.substr(col) << '\n';
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM