简体   繁体   中英

Regex match anything between multiple times from git log

I want split the git log message into parts, so I can access each commit and its hash and message separated.

This is the git log command:

git log --pretty=short --abbrev-commit -n 2 HEAD

Here an example log:

commit bfb9bac
Author: XXXXX XXXXXXXX <xxx.xxxxx@xxxxx.xxx>

    Something awesome happened here

commit a4fad44
Author: XXXXX XXXXXXXX <xxx.xxxxx@xxxxx.xxx>

    Ooh, more awesomeness
    So many lines

what I have tried so far:

([a-f0-9]{7})\n(?:Author.+\n\n)([\s\S]+)(?=\ncommit)

here a link to RegExr: https://regexr.com/4d523

at the end it should look like this:

const result = commits.match(regex)

result[0][0] // bfb9bac
result[0][1] // Something awesome happened here

result[1][0] // a4fad44
result[1][1] // Ooh, more awesomeness\n    So many lines

It would be also okay to do this in two steps; first splitting the commits and then splitting hash and message.

You can omit the use of [\\s\\S] by matching the whole string using .* and repeting a pattern that matches a newline and asserts that the string does not start with commit:

^commit ([a-f0-9]{7})\nAuthor.*\n+[ \t]+(.*(?:\n(?!commit).*)*)

Explanation

  • ^ Start of string
  • commit Match commit followed by a space
  • ([a-f0-9]{7}) Capture in group 1 matching 7 times what is listed in the character class
  • \\nAuthor.* Match a newline, then Author and 0+ times any char except a newline
  • \\n+[ \\t]+ Match 1+ times a newline followed by 1+ spaces or tabs
  • ( Capturing group
    • .* Match 0+ times any char except a newline
    • (?:\\n(?!commit).*)* Repeat 0+ times matching a newline, assert what is on the right is not commit, then match any char 0+ times except a newline
  • ) Close capturing group

Regex demo

 const regex = /^commit ([a-f0-9]{7})\\nAuthor.*\\n+[ \\t]+(.*(?:\\n(?!commit).*)*)/gm; const str = `commit bfb9bac Author: XXXXX XXXXXXXX <xxx.xxxxx@xxxxx.xxx> Something awesome happened here commit a4fad44 Author: XXXXX XXXXXXXX <xxx.xxxxx@xxxxx.xxx> Ooh, more awesomeness So many lines `; let m; while ((m = regex.exec(str)) !== null) { if (m.index === regex.lastIndex) { regex.lastIndex++; } console.log("hash: " + m[1]); console.log("message: " + m[2]); } 

You can use this regex to match each of the commit log and capture sha1 in group1 and message in group2,

^commit\s+(\S+)\n^Author:[\w\W]+?^\s+((?:(?!commit)[\w\W])+)

Regex Explanation:

  • ^commit - Starts matching commit at the beginning of line
  • \\s+(\\S+)\\n - Matches one or more whitespace followed by sha1 value which gets captured in group1 using (\\S+) followed by a newline \\n
  • ^Author:[\\w\\W]+? - Again starts matching Author from start of line followed by colon followed by any character one or more times as less as possible
  • ^\\s+ - This matches one or more whitespace from the beginning of line and this is the point from which message will start getting captured by next regex part
  • ((?:(?!commit)[\\w\\W])+) - This expression (aka tempered greedy token ) captures any character including newlines using [\\w\\W] but stops capturing if it sees commit and places the whole match in group2

Regex Demo

Here is a JS code demo,

 str = `commit bfb9bac Author: XXXXX XXXXXXXX <xxx.xxxxx@xxxxx.xxx> Something awesome happened here commit a4fad44 Author: XXXXX XXXXXXXX <xxx.xxxxx@xxxxx.xxx> Ooh, more awesomeness So many lines`; reg = new RegExp(/^commit\\s+(\\S+)\\n^Author:[\\w\\W]+?^\\s+((?:(?!commit)[\\w\\W])+)/mg); while(null != (m=reg.exec(str))) { console.log("SHA1: " + m[1] + ", Message: " + m[2]); } 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM