使用 JGit 从最小存储库中检出单个子目录

Question

I'm using JGit 6.5.x with Java 17. I have a remote repository that is huge (gigabytes), but I only need temporary access to a single subdirectory (eg foo/bar/ ) for processing.我正在使用 JGit 6.5.x 和 Java 17。我有一个巨大的远程存储库（千兆字节），但我只需要临时访问单个子目录（例如foo/bar/ ）进行处理。 The single subdirectory is really small (hundreds of kilobytes).单个子目录非常小（数百千字节）。 Cloning a shallow, bare repository is relatively small as well:克隆一个浅的裸存储库也相对较小：

try (final Git git = Git.cloneRepository()
    .setURI(REMOTE_REPOSITORY_URI.toASCIIString())
    .setDirectory(LOCAL_RESPOSITORY_PATH.toFile())
    .setBare(true)
    .setDepth(1)
    .call()) {
  System.out.println("cloned shallow, bare repository");
}

Is there a way to clone a shallow, bare repository like that (or any other minimal version of the repository), and then check out just the single subdirectory foo/bar to some other directory temporarily so that I can process those files using the normal Java file system API?有没有办法克隆一个像这样的浅层裸存储库（或任何其他最小版本的存储库），然后暂时将单个子目录foo/bar签出到其他目录，以便我可以使用正常的方式处理这些文件Java 文件系统 API？

Note that I just now succeeded in the the clone above and haven't started looking into how I might check out just a single subdirectory from this bare repository.请注意，我刚刚在上面的克隆中取得了成功，还没有开始研究如何从这个裸存储库中检出一个子目录。

Answer 1

Try below solution:尝试以下解决方案：

Note: Before apply any git changes make sure you have backup for necessary files.注意：在应用任何 git 更改之前，请确保您已备份必要的文件。

Use the git object to create a TreeWalk that will allow you to traverse the repository's tree and find the subdirectory you're interested in. Specify the starting path as the root of the repository:使用 git object 创建一个 TreeWalk，它允许您遍历存储库的树并找到您感兴趣的子目录。将起始路径指定为存储库的根目录：

try (Git git = Git.open(LOCAL_REPOSITORY_PATH.toFile())) {
    Repository repository = git.getRepository();

    // Get the tree for the repository's HEAD commit
    RevWalk revWalk = new RevWalk(repository);
    RevCommit commit = revWalk.parseCommit(repository.resolve(Constants.HEAD));
    RevTree tree = commit.getTree();

    // Create a TreeWalk starting from the root of the repository
    TreeWalk treeWalk = new TreeWalk(repository);
    treeWalk.addTree(tree);
    treeWalk.setRecursive(true);
    
    // Specify the path of the subdirectory you want to check out
    treeWalk.setFilter(PathFilter.create("foo/bar"));

    if (!treeWalk.next()) {
        throw new IllegalStateException("Subdirectory not found");
    }

    // Get the ObjectId of the subdirectory's tree
    ObjectId subdirectoryTreeId = treeWalk.getObjectId(0);
    treeWalk.close();
    
    // Create a new Git object with the shallow, bare repository
    Git subGit = new Git(repository);

    // Checkout the subdirectory's tree to a temporary directory
    Path temporaryDirectory = Files.createTempDirectory("subdirectory");
    subGit.checkout().setStartPoint(subdirectoryTreeId.getName()).setAllPaths(true).setForce(true).setTargetPath(temporaryDirectory.toFile()).call();

    // Now you can use the Java file system API to process the files in the temporary directory
    
    // Clean up the temporary directory when you're done
    FileUtils.deleteDirectory(temporaryDirectory.toFile());
}

In the code above, we use a TreeWalk to traverse the repository's tree and find the subdirectory you specified (foo/bar).在上面的代码中，我们使用 TreeWalk 遍历存储库的树并找到您指定的子目录 (foo/bar)。 We then get the ObjectId of the subdirectory's tree and create a new Git object with the repository.然后我们获取子目录树的 ObjectId，并使用存储库创建一个新的 Git object。 Finally, we use checkout() to check out the subdirectory's tree to a temporary directory, and you can use the Java file system API to process the files in that directory.最后，我们使用 checkout() 将子目录的树检出到一个临时目录，您可以使用 Java 文件系统 API 来处理该目录中的文件。 Don't forget to clean up the temporary directory when you're done.完成后不要忘记清理临时目录。

Note that the code assumes you have the necessary JGit and Java IO imports in place.请注意，该代码假定您已准备好必要的 JGit 和 Java IO 导入。

Answer 2

Inspired by another answer I was able get a single-depth clone and check out only a single path without needing to do a bare clone, while using similar minimal file system space.受另一个答案的启发，我能够获得单深度克隆并仅检出单个路径而无需进行裸克隆，同时使用类似的最小文件系统空间。 The benefit to this approach is that only a single top-level directory is needed;这种方法的好处是只需要一个顶级目录； the bare repository approach on the other hand requires a manual traversal and saving to a separate drop-level directory.另一方面，裸存储库方法需要手动遍历并保存到单独的下级目录。

The key is to use setNoCheckout(true) (in addition to setDepth(1) ), and then after cloning manually perform a separate checkout specifying the requested path.关键是使用setNoCheckout(true) （除了setDepth(1) ），然后在克隆后手动执行单独的检查指定请求的路径。 Note that you must specify setStartPoint("HEAD") or specify a hash starting point, as there will be no branch because there is not yet a checkout.请注意，您必须指定setStartPoint("HEAD")或指定 hash 起点，因为还没有结帐，因此不会有分支。

try (final Git git = Git.cloneRepository()
    .setURI(REMOTE_REPOSITORY_URI.toASCIIString())
    .setDirectory(LOCAL_RESPOSITORY_PATH.toFile())
    .setNoCheckout(true)
    .setDepth(1)
    .call()) {

  gitRepository.checkout()
    .setStartPoint("HEAD")
    .addPath("foo/bar")
    .call();

}

This seems to work very nicely!这似乎工作得很好！ I would imagine it uses something similar to Satyajit Bhatt's answer under the hood.我想它使用了类似于Satyajit Bhatt 在幕后的回答的东西。

使用 JGit 从最小存储库中检出单个子目录

问题描述

2 个解决方案

解决方案1
2 2023-05-31 18:07:38

解决方案2
1 已采纳 2023-06-01 15:09:49

使用 JGit 从最小存储库中检出单个子目录

问题描述

2 个解决方案

解决方案1 2 2023-05-31 18:07:38

解决方案2 1 已采纳 2023-06-01 15:09:49

解决方案1
2 2023-05-31 18:07:38

解决方案2
1 已采纳 2023-06-01 15:09:49