简体   繁体   English

从一个巨大的 CPP 项目的依赖图中提取一个自治块?

[英]Extract an autonomous chunk of the dependency graph of a huge CPP project?

Consider Chromium codebase.考虑Chromium代码库。 It's huge, around 4gb of pure code, if I'm not mistaken.如果我没记错的话,它很大,大约 4gb 的纯代码。 But however humongous it may be, it's still modular in its nature.但是无论它多么庞大,它在本质上仍然是模块化的。 And it implements a lot of interesting features in its internals.它在内部实现了许多有趣的功能。

What I mean is for example I'd like to extract websocket implementation out of the sources, but it's not easy to do by hand.我的意思是,例如,我想从源代码中提取websocket实现,但手工操作并不容易。 Ok, if we go to https://github.com/chromium/chromium/tree/main.net/websockets we'll see lots of header files.好的,如果我们从 go 到https://github.com/chromium/chromium/tree/main.net/websockets我们会看到很多 header 文件。 To compile the code as a "library" we're gonna need them + their implementation in .cpp files.要将代码编译为“库”,我们需要它们以及它们在.cpp文件中的实现。 But the trick is that these header files include other header files in other directories of the chromium project.但诀窍在于,这些 header 文件在chromium项目的其他目录中include其他 header 文件。 And those in their turn include others...而那些又include其他人......

BUT if there are no circular dependencies we should be able to get to the root of this tree, where header files won't include anything (or will include already compiled libraries), which should mean that all the needed files for this dependency subtree are in place, so we can compile a chunk of the original codebase separate from the rest of it.但是,如果没有循环依赖,我们应该能够到达这棵树的根,其中 header 文件将不include任何内容(或将include已编译的库),这应该意味着该依赖子树所需的所有文件都是到位,因此我们可以编译与 rest 分开的原始代码库的一部分。

That's the idea.就是这个主意。 At least in theory.至少在理论上。

Does anyone know how it could be done?有谁知道怎么做? I've found this repo and this repo , but they only show the dependency graph and do not have the functionality to extract a tree from it.我找到了这个 repo这个 repo ,但它们只显示依赖关系图并且没有从中提取树的功能。

There should be a tool already, I suppose.我想应该已经有一个工具了。 It's just hard to word it out to google.很难用谷歌来表达它。 Or perhaps I'm mistaken and this approach wouldn't really work?或者也许我弄错了,这种方法真的行不通?

Your compiler is almost surely capable of extracting this dependency information so that it can be used to help the build system figure out incremental builds.您的编译器几乎肯定能够提取此依赖信息,以便它可用于帮助构建系统确定增量构建。 In gcc , for instance, we have the -MMD flag.例如,在gcc中,我们有-MMD标志。

Suppose we have four compilation units, ball.cpp , football.cpp , basketball.cpp , and hockey.cpp .假设我们有四个编译单元, ball.cppfootball.cppbasketball.cpphockey.cpp Each source file includes a header file of the same name.每个源文件都包含一个同名的 header 文件。 Also, football.hpp and basketball.hpp each include ball.hpp .此外, football.hppbasketball.hpp都包含ball.hpp

If we run如果我们跑

g++ -MMD   -c -o football.o football.cpp
g++ -MMD   -c -o basketball.o basketball.cpp
g++ -MMD   -c -o hockey.o hockey.cpp
g++ -MMD   -c -o ball.o ball.cpp

then this will produce, in addition to the object files, some files with names like basketball.d that contain dependency information like那么这将产生,除了 object 文件之外,一些名称如basketball.d的文件包含依赖信息,如

basketball.o: basketball.cpp basketball.h ball.h

It's simple enough to read these into, say, a python script, and then just take the union of all the dependencies of the files you want to include.将这些读入 python 脚本非常简单,然后只需合并要包含的文件的所有依赖项即可。


EDIT : In fact, python may even be overkill.编辑:事实上, python 甚至可能有点矫枉过正。 In the situation above, if you wanted to get all dependencies for anything containing the word "ball," you could do something like在上面的情况下,如果你想获得任何包含单词“ball”的所有依赖项,你可以这样做

$ cat *.d | awk -F: '$1 ~ "ball" { print $2 }' | xargs -n 1 echo | sort | uniq

which will output这将 output

ball.cpp
ball.h
basketball.cpp
basketball.h
football.cpp
football.h

If you're not used to reading UNIX pipelines, this:如果您不习惯阅读 UNIX 管道,那么:

  • Concatenates all the *.d files in the current directory;连接当前目录中的所有 *.d 文件;
  • Goes through them line-by-line, splitting each line into fields delimited by : characters;逐行浏览它们,将每一行拆分为由:字符分隔的字段;
  • Prints out the second field (ie the list of dependencies) for any line where the first field (ie the target) matches the regex "ball";为第一个字段(即目标)与正则表达式“ball”匹配的任何行打印出第二个字段(即依赖项列表);
  • Splits the results into individual lines;将结果拆分为单独的行;
  • Sorts the resulting lines;对结果行进行排序; and
  • Throws out any duplicates.抛出任何重复项。

You can see that this produced a list of everything the ball-related files depend on, but skipped hockey.cpp and hockey.hpp which aren't dependencies of any file with "ball" in its name.您可以看到,这生成了与球相关的文件所依赖的所有内容的列表,但跳过了hockey.cpphockey.hpp ,它们不依赖于名称中包含“ball”的任何文件。 (Of course in your case you might use "websockets" instead of "ball," and if there is some directory structure instead of everything being in the root directory you may have to do a bit to compensate for that.) (当然,在你的情况下,你可能会使用“websockets”而不是“ball”,如果有一些目录结构而不是根目录中的所有内容,你可能需要做一些补偿。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM