How to synchronize branching between independent git repositories?

Question

I have two independent git repositories; one to hold the code for my data analysis project, and one to hold the output of running the code. So, it looks like this:

.
|-- project_output
|   |-- .git
|   |-- output_sample1
|   |-- output_sample2
|   `-- output_sample3
`-- project_code
    |-- .git
    |-- code
    |   `-- all_my_scripts.sh
    `-- output -> ../project_output

The output data consists of very large text-based files, and I keep it in project_output . The project itself is open source on GitHub, and is tracked on project_code . I use git to track changes in both.

When I want to add a new feature to the project_code , or debug or alter an old feature, I make a branch:

project_code$ git checkout -b fix-some-bug
project_code$ # make some changes, run the new code
project_code$ # save output in output -> ../project_output

Now, I can review changes to the output in project_output :

project_output$ git status
project_output$ git diff

If I want to keep the new output, I would make a commit:

project_output$ git add -u; git add .
project_output$ git commit -m "Update results from project_code/fix-some-bug branch"

However, it gets tedious and difficult to track the output from different branches of project_code in project_output like this. I think it would be much easier if there was a system where creating & changing branches in project_code would be mirrored in project_output . For example:

project_code$ git checkout -b fix-some-bug # project_output/fix-some-bug is created
project_code$ git checkout master # project_output switches to master branch as well
project_code$ git merge fix-some-bug # project_output merges fix-some-bug to master as well

I guess it sounds a lot like I want the benefits of having a single repository, while maintain the security of keeping sensitive data from ever touching my public repo.

Answer 1

To summarize, you'd kind of like to have one public repository that has code-only, and a second private repository that gets its code from the public repository, but then adds on data. It turns out that this is really easy to do (although it's also easy to accidentally publish your private data, if you are not careful).

Git is very much like the Star Trek Borg collective : it likes to take another repository's technological distinctiveness—ie, new commits—and add it to its own. In fact, this is exactly what git fetch does.

To use git fetch , you tell your Git to call up some other Git, usually via Internet-phone at some URL. Your Git then gets a list of all their references —branch and tag names mainly, but also other things. (More precisely, your Git gets whatever their Git is willing to show you, but by default, they show you everything .) These reference names point to specific commits. ¹ Your Git then asks for any commits you don't already have, and any other objects needed as well to complete them.

Since the direction of git fetch is "from them, to us", all the transfers work that way. (The nearest thing to an opposite of git fetch is git push , where we instruct our Git to call up another Git and send, to them , our technological distinctiveness. Obviously you won't want to do that from your private repository.) Once our Git has all the objects, it can either stop there, or set up names to remember the objects.

The names we get, if we tell our Git to save names, are our names, not theirs. When we copy commits using their branch names as starting-points to find the commits, though, we normally have our Git save these via remote-tracking branch names in our own repository. For instance, if their master had commit deadbee that we did not have, we copy their deadbee into our repository, and then make our origin/master remember this hash ID deadbee . ² (If the parent commit of deadbee is ac0ffee , we have our Git take their ac0ffee too, unless we already have it, and so on.)

If we have our Git pick up tag -named commits (and/or tag objects as in footnote 1), we have our Git store these tag names in our tag names, not in "remote tags", so if they added a tag named v2.3 , we set a new tag for ourselves also named v2.3 . The fancy renaming is only for branches, by default. But this is under your own control: it's your repository, so you control everything.

In any case, you can instruct your Git not to set up your own names at all. If you do that, you rely on something git fetch has done from the Dim Time, which is that it saves every name it got in .git/FETCH_HEAD , always. A normal git fetch overwrites the previous FETCH_HEAD , so you must extract the commit IDs from that file, and do something to remember them, before you run git fetch again.

Meanwhile, though, whether or not you have set your own names for their commits, you have all their commits (well, all the ones you instructed your git fetch to copy). Your Git has, Borg-like, added their technological distinctiveness to your own.

Hence, all you have to do is set up your public repository as a named remote in your private repository, and run git fetch :

~/repos/private$ git remote add public https://github.com/...

or:

~/repos/private$ git remote add public file://~me/repos/public

or whatever URL you like. After that, running:

~/repos/private$ git fetch public

will have your Git call up another Git (perhaps on your own machine! ³ ) using the saved URL, and download into your private repository, any new unique commits found in "their" (your other) repository. It will name "their" branches public/master and so on, ie, rename their branch from X to public/ X , because the name we used with git add , to create this "remote", was public .

Just be careful not to push your private commits up to your public repository. Git, like the Borg, is really happy to add new things, but will fight you to the death about removing things. Well, perhaps not death , exactly. :-) But once data have escaped like this, anyone could clone it, and even if you manage to scrub it away from the public repository quickly, it might have been copied and widely distributed.

¹ Tag names can point to any of the four object types. Often they will point to an annotated tag object, and the tag object then points to a commit, but sometimes tag names just point directly to commits anyway. Branch names may only point to commits.

² This is where fetch and push are different: when we send then a commit from our master , we usually ask them to set their master . They don't have a "push-tracking branch" for us. However, if we are using pull requests, we do this "please set your master" in an even more roundabout way, by sending our master to a name they can recognize as "please look at this, then decide whether you like it" rather than the more automatic "please automatically take this as long as it fits easily".

In other words, pull requests are the push equivalent of remote-tracking branches: a "safe place" to store things that you don't entirely trust yet, so that you can look at, and decide about, these new objects before incorporating them. Because their names tend to be terrible—pull requests are usually numbered, and there's no obvious connection between "PR#1234" and, eg, "I'd like you to incorporate this into feature/california-bear", some people do this differently. They push to a public repository of their own, then announce, eg, by email: "I have new stuff for you in my feature/beetle, why don't you go git fetch from my public repository." This serves exactly the same purpose: you make your commit cafedad available by some name, at some URL. It's then up to the other person, using either the same URL or a different one, to retrieve your commits, either because they are already at that URL under some weird name like pull/1234/head , or at your URL under the name feature/beetle .

³ When copying files from one repository on your own machine to another repository on your own machine—ie, using path URLs or file:// URLs—your Git may wind up playing both the role of "your Git" and "their Git" in fetch and push conversations. From a high level view, however, the effect is the same as if there were two separate Gits, working in two separate repositories, exchanging data through a relatively narrow channel, as if over an Internet connection.

How to synchronize branching between independent git repositories?

Question

1 answers

solution1
1 2017-02-24 20:15:29

How to synchronize branching between independent git repositories?

Question

1 answers

solution1 1 2017-02-24 20:15:29

solution1
1 2017-02-24 20:15:29