简体   繁体   English

在 bash 中使用两个捕获解析字符串

[英]Parsing string with two captures in bash

I'm trying to parse a string with regex.我正在尝试使用正则表达式解析字符串。 A valid string is of the following format:有效字符串的格式如下:

https://github.com/xyz/abc/a_123/project_14.git

The valid string should contain github.com and xyz or zyx .有效字符串应包含github.comxyzzyx If the string is valid I want to capture abc/a_123 into $A and project_14 into $B .如果字符串有效,我想将abc/a_123捕获到$A ,将project_14捕获到$B中。

What I did:我做了什么:

if [[ "$x" == *"github.com"* ]]; then
    if [[ "$x" == *"xyz"* ]]; then
        # (1)
    elif [[ "$x" == *"zyx"* ]]; then
        # (2)
    else
        return 1 # Invalid
    fi
    return 0 # Valid
fi
return 1 # Invalid

In both (1) and (2) I want to set $A and $B with the values (same behavior on different cases).(1)(2)中,我想设置$A$B的值(在不同情况下的行为相同)。 Also, I think that this solution is not good because it will enter the if-else in the case of https://github.com/bla/abc/a_123/xyz.git so I guess we need to change it to be "github.com/xyz" .另外,我认为这个解决方案不好,因为它会在https://github.com/bla/abc/a_123/xyz.git的情况下进入 if-else 所以我想我们需要将其更改为"github.com/xyz" Also, how can I get rid of .git (if exists)?另外,我怎样才能摆脱.git (如果存在)?

Another example:另一个例子:

https://github.com/zyx/asdasdas/lalal/asdas/nu.git
# $A = asdasdas/lalal/asdas
# $B = nu

What is the proper way to achieve this goal?实现这一目标的正确方法是什么?

Here is a way using regex:这是使用正则表达式的一种方法:

url='https://github.com/xyz/abc/a_123/project_14.git'

if [[ $url =~ http[s]?:[/]{2}(github.com)[/]([[:alpha:]]+)(/.*)$ ]] 
then    
    $A=${BASH_REMATCH[2]}
    $B=${BASH_REMATCH[3]%.git}
fi

And here is a small proof of concept:这是一个小的概念证明:

url='https://github.com/xyz/abc/a_123/project_14.git'

if [[ $url =~ http[s]?:[/]{2}(github.com)[/]([[:alpha:]]+)(/.*)$ ]]
then
   echo ${BASH_REMATCH[2]} ${BASH_REMATCH[3]%.git}
fi

Resulting in:导致:

xyz /abc/a_123/project_14

I think this does what you want:我认为这可以满足您的要求:

#!/bin/bash

repo="https://github.com/xyz/abc/a_123/project_14.git"

[[ ! "$repo" =~ https:\/\/github.com\/[a-z]+\/[a-z]+\/[a-z]_[0-9]+\/.*.git ]] && exit

A=$( echo "$repo" | sed -E "s/(https:\/\/github.com\/[a-z]+)(\/[a-z]+\/[a-z]_[0-9]+\/)(.*.git)/\2/g" )
B=$( echo "$repo" | sed -E "s/(https:\/\/github.com\/[a-z]+)(\/[a-z]+\/[a-z]_[0-9]+\/)(.*.git)/\3/g" )

echo "$A"
echo "${B%%.git}"

Let me know if it helps让我知道它是否有帮助

Would you please try the following:请您尝试以下方法:

strchk() {
    local x=$1
    if [[ $x =~ github.com/(xyz|zyx)/(.+)/(.+) ]]; then
        A="${BASH_REMATCH[2]}"
        B="${BASH_REMATCH[3]%.*}"
        return 0
    else
        return 1
    fi
}

Results:结果:

strchk "https://github.com/xyz/abc/a_123/project_14.git" && echo "A=$A, B=$B"
=> A=abc/a_123, B=project_14
strchk "https://github.com/bla/abc/a_123/xyz.git" && echo "A=$A, B=$B"
=> <empty>
strchk "https://github.com/zyx/asdasdas/lalal/asdas/nu.git" && echo "A=$A, B=$B"
=> A=asdasdas/lalal/asdas, B=nu

Explanations:说明:

  • The pattern github.com/(xyz|zyx)/ matches a string which contains github.com/ followed by xyz/ or zyx/ .模式github.com/(xyz|zyx)/匹配包含github.com/后跟xyz/或 zyx zyx/的字符串。
  • The next pattern (.+)/ matches a substring after xyz/ or zyx/ as long as it reaches the rightmost slash then stores the captured substring within the parens into a bash variable ${BASH_REMATCH[2]} .下一个模式(.+)/匹配xyz/或 zyx zyx/之后的 substring,只要它到达最右边的斜线,然后将捕获的 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ 存储在括号内的 ZD574D4BB40C84861791A694A999 ${BASH_REMATCH[2]} CCE69Z 变量中。
  • The last pattern (.+) captures the remaining substring into ${BASH_REMATCH[3]} .最后一个模式(.+)将剩余的 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ 捕获到${BASH_REMATCH[3]}中。
  • The parameter expansion ${BASH_REMATCH[3]%.*} removes the extension after the dot if exists.参数扩展${BASH_REMATCH[3]%.*}如果存在,则删除点之后的扩展名。

Hope this helps.希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM