[英]Parsing string with two captures in bash
I'm trying to parse a string with regex.我正在尝试使用正则表达式解析字符串。 A valid string is of the following format:
有效字符串的格式如下:
https://github.com/xyz/abc/a_123/project_14.git
The valid string should contain github.com
and xyz
or zyx
.有效字符串应包含
github.com
和xyz
或zyx
。 If the string is valid I want to capture abc/a_123
into $A
and project_14
into $B
.如果字符串有效,我想将
abc/a_123
捕获到$A
,将project_14
捕获到$B
中。
What I did:我做了什么:
if [[ "$x" == *"github.com"* ]]; then
if [[ "$x" == *"xyz"* ]]; then
# (1)
elif [[ "$x" == *"zyx"* ]]; then
# (2)
else
return 1 # Invalid
fi
return 0 # Valid
fi
return 1 # Invalid
In both (1)
and (2)
I want to set $A
and $B
with the values (same behavior on different cases).在
(1)
和(2)
中,我想设置$A
和$B
的值(在不同情况下的行为相同)。 Also, I think that this solution is not good because it will enter the if-else in the case of https://github.com/bla/abc/a_123/xyz.git
so I guess we need to change it to be "github.com/xyz"
.另外,我认为这个解决方案不好,因为它会在
https://github.com/bla/abc/a_123/xyz.git
的情况下进入 if-else 所以我想我们需要将其更改为"github.com/xyz"
。 Also, how can I get rid of .git
(if exists)?另外,我怎样才能摆脱
.git
(如果存在)?
Another example:另一个例子:
https://github.com/zyx/asdasdas/lalal/asdas/nu.git
# $A = asdasdas/lalal/asdas
# $B = nu
What is the proper way to achieve this goal?实现这一目标的正确方法是什么?
Here is a way using regex:这是使用正则表达式的一种方法:
url='https://github.com/xyz/abc/a_123/project_14.git'
if [[ $url =~ http[s]?:[/]{2}(github.com)[/]([[:alpha:]]+)(/.*)$ ]]
then
$A=${BASH_REMATCH[2]}
$B=${BASH_REMATCH[3]%.git}
fi
And here is a small proof of concept:这是一个小的概念证明:
url='https://github.com/xyz/abc/a_123/project_14.git'
if [[ $url =~ http[s]?:[/]{2}(github.com)[/]([[:alpha:]]+)(/.*)$ ]]
then
echo ${BASH_REMATCH[2]} ${BASH_REMATCH[3]%.git}
fi
Resulting in:导致:
xyz /abc/a_123/project_14
I think this does what you want:我认为这可以满足您的要求:
#!/bin/bash
repo="https://github.com/xyz/abc/a_123/project_14.git"
[[ ! "$repo" =~ https:\/\/github.com\/[a-z]+\/[a-z]+\/[a-z]_[0-9]+\/.*.git ]] && exit
A=$( echo "$repo" | sed -E "s/(https:\/\/github.com\/[a-z]+)(\/[a-z]+\/[a-z]_[0-9]+\/)(.*.git)/\2/g" )
B=$( echo "$repo" | sed -E "s/(https:\/\/github.com\/[a-z]+)(\/[a-z]+\/[a-z]_[0-9]+\/)(.*.git)/\3/g" )
echo "$A"
echo "${B%%.git}"
Let me know if it helps让我知道它是否有帮助
Would you please try the following:请您尝试以下方法:
strchk() {
local x=$1
if [[ $x =~ github.com/(xyz|zyx)/(.+)/(.+) ]]; then
A="${BASH_REMATCH[2]}"
B="${BASH_REMATCH[3]%.*}"
return 0
else
return 1
fi
}
Results:结果:
strchk "https://github.com/xyz/abc/a_123/project_14.git" && echo "A=$A, B=$B"
=> A=abc/a_123, B=project_14
strchk "https://github.com/bla/abc/a_123/xyz.git" && echo "A=$A, B=$B"
=> <empty>
strchk "https://github.com/zyx/asdasdas/lalal/asdas/nu.git" && echo "A=$A, B=$B"
=> A=asdasdas/lalal/asdas, B=nu
Explanations:说明:
github.com/(xyz|zyx)/
matches a string which contains github.com/
followed by xyz/
or zyx/
.github.com/(xyz|zyx)/
匹配包含github.com/
后跟xyz/
或 zyx zyx/
的字符串。(.+)/
matches a substring after xyz/
or zyx/
as long as it reaches the rightmost slash then stores the captured substring within the parens into a bash variable ${BASH_REMATCH[2]}
.(.+)/
匹配xyz/
或 zyx zyx/
之后的 substring,只要它到达最右边的斜线,然后将捕获的 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ 存储在括号内的 ZD574D4BB40C84861791A694A999 ${BASH_REMATCH[2]}
CCE69Z 变量中。(.+)
captures the remaining substring into ${BASH_REMATCH[3]}
.(.+)
将剩余的 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ 捕获到${BASH_REMATCH[3]}
中。${BASH_REMATCH[3]%.*}
removes the extension after the dot if exists.${BASH_REMATCH[3]%.*}
如果存在,则删除点之后的扩展名。 Hope this helps.希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.