[英]Extract an autonomous chunk of the dependency graph of a huge CPP project?
Consider Chromium
codebase.考虑
Chromium
代码库。 It's huge, around 4gb of pure code, if I'm not mistaken.如果我没记错的话,它很大,大约 4gb 的纯代码。 But however humongous it may be, it's still modular in its nature.
但是无论它多么庞大,它在本质上仍然是模块化的。 And it implements a lot of interesting features in its internals.
它在内部实现了许多有趣的功能。
What I mean is for example I'd like to extract websocket
implementation out of the sources, but it's not easy to do by hand.我的意思是,例如,我想从源代码中提取
websocket
实现,但手工操作并不容易。 Ok, if we go to https://github.com/chromium/chromium/tree/main.net/websockets we'll see lots of header files.好的,如果我们从 go 到https://github.com/chromium/chromium/tree/main.net/websockets我们会看到很多 header 文件。 To compile the code as a "library" we're gonna need them + their implementation in
.cpp
files.要将代码编译为“库”,我们需要它们以及它们在
.cpp
文件中的实现。 But the trick is that these header files include
other header files in other directories of the chromium
project.但诀窍在于,这些 header 文件在
chromium
项目的其他目录中include
其他 header 文件。 And those in their turn include
others...而那些又
include
其他人......
BUT if there are no circular dependencies we should be able to get to the root of this tree, where header files won't include
anything (or will include
already compiled libraries), which should mean that all the needed files for this dependency subtree are in place, so we can compile a chunk of the original codebase separate from the rest of it.但是,如果没有循环依赖,我们应该能够到达这棵树的根,其中 header 文件将不
include
任何内容(或将include
已编译的库),这应该意味着该依赖子树所需的所有文件都是到位,因此我们可以编译与 rest 分开的原始代码库的一部分。
That's the idea.就是这个主意。 At least in theory.
至少在理论上。
Does anyone know how it could be done?有谁知道怎么做? I've found this repo and this repo , but they only show the dependency graph and do not have the functionality to extract a tree from it.
我找到了这个 repo和这个 repo ,但它们只显示依赖关系图并且没有从中提取树的功能。
There should be a tool already, I suppose.我想应该已经有一个工具了。 It's just hard to word it out to google.
很难用谷歌来表达它。 Or perhaps I'm mistaken and this approach wouldn't really work?
或者也许我弄错了,这种方法真的行不通?
Your compiler is almost surely capable of extracting this dependency information so that it can be used to help the build system figure out incremental builds.您的编译器几乎肯定能够提取此依赖信息,以便它可用于帮助构建系统确定增量构建。 In
gcc
, for instance, we have the -MMD
flag.例如,在
gcc
中,我们有-MMD
标志。
Suppose we have four compilation units, ball.cpp
, football.cpp
, basketball.cpp
, and hockey.cpp
.假设我们有四个编译单元,
ball.cpp
、 football.cpp
、 basketball.cpp
和hockey.cpp
。 Each source file includes a header file of the same name.每个源文件都包含一个同名的 header 文件。 Also,
football.hpp
and basketball.hpp
each include ball.hpp
.此外,
football.hpp
和basketball.hpp
都包含ball.hpp
。
If we run如果我们跑
g++ -MMD -c -o football.o football.cpp
g++ -MMD -c -o basketball.o basketball.cpp
g++ -MMD -c -o hockey.o hockey.cpp
g++ -MMD -c -o ball.o ball.cpp
then this will produce, in addition to the object files, some files with names like basketball.d
that contain dependency information like那么这将产生,除了 object 文件之外,一些名称如
basketball.d
的文件包含依赖信息,如
basketball.o: basketball.cpp basketball.h ball.h
It's simple enough to read these into, say, a python script, and then just take the union of all the dependencies of the files you want to include.将这些读入 python 脚本非常简单,然后只需合并要包含的文件的所有依赖项即可。
EDIT : In fact, python may even be overkill.编辑:事实上, python 甚至可能有点矫枉过正。 In the situation above, if you wanted to get all dependencies for anything containing the word "ball," you could do something like
在上面的情况下,如果你想获得任何包含单词“ball”的所有依赖项,你可以这样做
$ cat *.d | awk -F: '$1 ~ "ball" { print $2 }' | xargs -n 1 echo | sort | uniq
which will output这将 output
ball.cpp
ball.h
basketball.cpp
basketball.h
football.cpp
football.h
If you're not used to reading UNIX pipelines, this:如果您不习惯阅读 UNIX 管道,那么:
:
characters;:
字符分隔的字段; You can see that this produced a list of everything the ball-related files depend on, but skipped hockey.cpp
and hockey.hpp
which aren't dependencies of any file with "ball" in its name.您可以看到,这生成了与球相关的文件所依赖的所有内容的列表,但跳过了
hockey.cpp
和hockey.hpp
,它们不依赖于名称中包含“ball”的任何文件。 (Of course in your case you might use "websockets" instead of "ball," and if there is some directory structure instead of everything being in the root directory you may have to do a bit to compensate for that.) (当然,在你的情况下,你可能会使用“websockets”而不是“ball”,如果有一些目录结构而不是根目录中的所有内容,你可能需要做一些补偿。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.