Fork a file within a Git repository

Question

I'm working on an R project which currently has the following directory layout:

proj1
  |-- file.r

file.r is used to build a statistical model specific to Project 1 (hence proj1 ).

During the course of development, we will be building numerous models for numerous projects:

Work
  |-- proj1
  |     └-- file.r
  |-- proj2
  |     └-- file.r
  :
  └-- projn
        └-- file.r

file.r will be 90% similar between each of the projects, but there will be differences. My question is, is there a way to create a master file.r file and simply fork it for each project? That way, a bugfix/enhancement to the master can simply be rebased down to the forks, and the file-specific changes will be simply applied on top. My first thought was to use submodules, but I'm not certain how to apply that here. Thanks!

Answer 1

Use a "topic branch" for each project:

git checkout master
git add file.r ;# this is your master template upon which others are based
git commit -m "Committed the master file"

Then for each project:

git checkout -B <project> master ;# create and checkout <project> branch
<hack away on file.r, commit when you want>
git push origin <project> ;# to share <project> with others

So in practice you end up with master , upon which, say, project1 , project2 , project3 and so forth are based. Should do exactly what you want and keep it all quite sane.

Advantages of this solution over others that encourage multiple repositories:

Easier to manage. You've only got one repository that in practice has, what, 20-30 branches at most? Sounds like a lot, but with clear labels its simple to know where you are, particularly if you're only managing a small file set.
Easy diffs if you're lazy (as I am). You can see the differences in the file between two projects' file.r with git diff projectA projectB -- file.r . You could do the same with multiple repositories, but it requires a repository specification like git diff projectA/master projectB/master -- file.r . Could get confusing if you have 20-30 project repositories or use submodules.
Easy updates. Grabbing updates is as simple as issuing git fetch origin and watching the output.
Easy clones. When setting up a new local repo, you clone a single remote. No need to clone origin, then git remote add <project> repositories until you've got them all.

Disadvantages (an incomplete list):

This method relies on you paying close attention to your checked out branch. Nothing about the directory structure will clue you in, so it might not be as obvious what file.r you're viewing at any given moment. That might be a deal breaker. I dunno. I suppose it depends on your workflow.
As KurzedMetal points out in comments, this could get messy fast if you ever need to merge all the projects into one. As such, I wouldn't recommend it for source code. For distinct R projects, however, this might be less of a concern.

Answer 2

IMO the best way to achieve this is:

create a library and a repo of your shared code
create a repo for each project
use git submodule to integrate the shared code to each project
import the r library and add the project specific code.

Answer 3

There are ways described in the other answers.

For example,

use object-oriented-patterns or templates to increase reuse and reduce code.
use git branch
use git submodule

Finally, when no other way, I use comments in the file header, that it is a fork of another file.

  Date | Author | Description ------------- | ------------------- | -------------- 05/18/2018 | You | Forked from Other/file.r

Fork a file within a Git repository

Question

3 answers

solution1
3 2012-07-06 16:58:04

solution2
3 2012-07-06 17:00:44

solution3
0 2018-07-07 14:41:06

Fork a file within a Git repository

Question

3 answers

solution1 3 2012-07-06 16:58:04

solution2 3 2012-07-06 17:00:44

solution3 0 2018-07-07 14:41:06

solution1
3 2012-07-06 16:58:04

solution2
3 2012-07-06 17:00:44

solution3
0 2018-07-07 14:41:06