简体   繁体   中英

Prevent large text file from being added to commit when using GitHub

We want to prevent:

  • Very large text files (> 50MB per file) from being committed to git instead of git-lfs , as they inflate git history.
  • Problem is, 99% of them are < 1MB, and should be committed for better diffing.
  • The reason of variance in size: these are YAML files, they support binary serialization via base64 encoding.
  • The reason we can't reliably prevent binary serialization: this is a Unity project, binary serialization is needed for various reasons.

Given:

  • GitHub hosting's lack of pre-receive hook support.
  • git-lfs lack of file size attribute support.

Questions:

  1. How can we reliably prevent large files from being added to commit?
  2. Can this be done through a config file in repo so all users follow this rule gracefully?
  3. If not, can this be done by bash command aliasing so trusted users can see a warning message when they accidentally git add a large file and it's not processed by git-lfs ?

(Our environment is macOS. I have looked at many solutions and so far none satisfy our needs)

  • How can we reliably prevent large files from being added to commit?
  • Can this be done through a config file in the repo so all users follow this rule gracefully? Since GitHub doesn't support server-side hooks you can use client-side hooks. As you probably aware, those hooks can be passed and be disabled with no problem, but still, this is a good way to do it.

core.hooksPath

Git v2.9 added the ability to set the client hooks on remote folder. Prior to that, the hooks must have been placed inside the .git folder.

This will allow you to write scripts and put them anywhere. I assume you know what hooks are but if not feel free to ask.


How to do it?

Usually, you place the hooks inside your repo (or any other common folder).

# set the hooks path. for git config, the default location is --local
# so this configuration is locally per project
git config core.hooksPath .githooks

Alright, with helps from CodeWizard and this SO answer , I managed to create a good guide myself:

First, setup your repo core.hooksPath with:

git config core.hooksPath .githooks

Second, create this pre-commit file inside .githooks folder, so it can be tracked ( gist link ), then remember to give it execution permission with chmod +x .

#!/bin/sh
#
# An example hook script to verify what is about to be committed.
# Called by "git commit" with no arguments. The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
#
# To enable this hook, rename this file to "pre-commit".

# Redirect output to stderr.
exec 1>&2

FILE_SIZE_LIMIT_KB=1024
CURRENT_DIR="$(pwd)"
COLOR='\033[01;33m'
NOCOLOR='\033[0m'
HAS_ERROR=""
COUNTER=0

# generate file extension filter from gitattributes for git-lfs tracked files
filter=$(cat .gitattributes | grep filter=lfs | awk '{printf "-e .%s$ ", $1}')

# before git commit, check non git-lfs tracked files to limit size
files=$(git diff --cached --name-only | sort | uniq | grep -v $filter)
while read -r file; do
    if [ "$file" = "" ]; then
        continue
    fi
    file_path=$CURRENT_DIR/$file
    file_size=$(ls -l "$file_path" | awk '{print $5}')
    file_size_kb=$((file_size / 1024))
    if [ "$file_size_kb" -ge "$FILE_SIZE_LIMIT_KB" ]; then
        echo "${COLOR}${file}${NOCOLOR} has size ${file_size_kb}KB, over commit limit ${FILE_SIZE_LIMIT_KB}KB."
        HAS_ERROR="YES"
        ((COUNTER++))
    fi
done <<< "$files"

# exit with error if any non-lfs tracked files are over file size limit
if [ "$HAS_ERROR" != "" ]; then
    echo "$COUNTER files are larger than permitted, please fix them before commit" >&2
    exit 1
fi

exit 0

Now, assuming you got both .gitattributes and git-lfs setup properly, this pre-commit hook will run when you try to git commit and make sure all staged files not tracked by git-lfs (as specified in your .gitattributes ), will satisfy the specified file size limit.

Any new users of your repo will need to setup core.hooksPath themselves, but beyond that, things should just work .

Hope this helps other Unity developers fighting with growing git repo size!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM