[英]Regex: Grab second string between two single quotes
Can I get a little help matching a string in the below text?我可以得到一些帮助来匹配下面文本中的字符串吗?
The default username and password is 'user' and 'ZWiliWH8E2mV'.默认用户名和密码是“user”和“ZWiliWH8E2mV”。
I'm trying to get the string between the second set of single quotes: ZWiliWH8E2mV.我正在尝试获取第二组单引号之间的字符串:ZWiliWH8E2mV。 This string is randomly generated, and I can only rely on the formatting, and not the ZWiliWH8E2mV.
这个字符串是随机生成的,我只能依赖格式,而不是 ZWiliWH8E2mV。 After some googling, I can match it with grep:
经过一番谷歌搜索后,我可以将其与 grep 相匹配:
cat file_name | grep -oP "(?<=').*?(?=')"
but it's the 3rd match, and I'm not sure how to get to it from there.但这是第 3 场比赛,我不确定如何从那里开始。 I'm open to using other tools if they're better for what I'm trying to do, but I'm not very versed in them.
如果其他工具更适合我正在尝试做的事情,我愿意使用它们,但我不是很精通它们。
As you stated in the question states that you are trying to get the string between the second set of single quotes, you could match the first 3 single quotes and start the match after it until the occurrence of the fourth single quote.正如您在问题中所述,您正在尝试获取第二组单引号之间的字符串,您可以匹配前 3 个单引号并在它之后开始匹配,直到出现第四个单引号。
The negated character class [^']+
matches any char except a single quote. 否定字符 class
[^']+
匹配除单引号之外的任何字符。
^(?:[^']+'){3}\K[^']+(?=')
Explanation解释
^
Start of string ^
字符串开始?:[^']+'){3}'
Match 3 times any char except '
then match ' ?:[^']+'){3}'
匹配任何字符 3 次,除了'
then match '\K
Clear the match buffer (Forget what is matches until this point) \K
清除匹配缓冲区(直到此时忘记匹配的是什么)[^']+
Match 1+ times any char except '
(What you want to match) [^']+
匹配 1+ 次除'
之外的任何字符(您要匹配的内容)(?=')
Positive lookahead, assert what is directly to the right is a '
(?=')
正面前瞻,断言直接在右边的是'
Regex demo |正则表达式演示| Bash demo
Bash演示
The updated code might look like更新后的代码可能看起来像
cat file_name | grep -oP "^(?:[^']+'){3}\K[^']+(?=')"
I'm trying to get the string between the second set of single quotes
我正在尝试获取第二组单引号之间的字符串
Using awk, you can avoid regex:使用 awk,可以避免正则表达式:
s="The default username and password is 'user' and 'ZWiliWH8E2mV'."
awk -F "'" '{print $4}' <<< "$s"
ZWiliWH8E2mV
Here we are using '
as field delimiter and 4th field in awk
will give us 2nd value wrapped inside single quotes.这里我们使用
'
作为字段分隔符, awk
中的第 4 个字段将为我们提供包含在单引号内的第 2 个值。
You may grab the value between the last two single quotation marks using grep
:您可以使用
grep
获取最后两个单引号之间的值:
grep -oP ".*'\\K[^']+(?=')" file_name
See the online demo查看在线演示
The -o
option outputs only matched substrings and P
makes grep
use PCRE regex engine. -o
选项仅输出匹配的子字符串, P
使grep
使用 PCRE 正则表达式引擎。
PCRE regex details PCRE 正则表达式详细信息
.*
- any 0 or more chars other than line break chars, as many as possible .*
- 除换行字符外的任何 0 个或多个字符,尽可能多'
- a '
char '
- 一个'
字符\K
- match reset operator that discards all text matched so far in the overall match memory buffer \K
- 匹配重置运算符,丢弃到目前为止在整体匹配 memory 缓冲区中匹配的所有文本[^']+
- one or more chars other than a '
char [^']+
- 除'
字符外的一个或多个字符(?=')
- a positive lookahead that makes sure there is a '
char immidiately to the right of the current location. (?=')
- 确保当前位置右边有一个'
字符的正向前瞻。If you have multiple single quoted fields:如果您有多个单引号字段:
$ s="'first' and 'second' and 'third' and 'fourth' and the rest"
You can use the following Perl one liner to get the nth
field:您可以使用以下 Perl 一行来获取第
nth
字段:
echo "$s" |
perl -lne 'while (/[\x27]([^\x27]*)[\x27]/g) {print $1 if ++$i==3}'
# third
So for your example, the password is the second quoted field:因此,对于您的示例,密码是第二个引用的字段:
echo "The default username and password is 'user' and 'ZWiliWH8E2mV'." |
perl -lne 'while (/[\x27]([^\x27]*)[\x27]/g) {print $1 if ++$i==2}'
Prints:印刷:
ZWiliWH8E2mV
You can also use gawk
with FPAT
set to the same regex to print the nth field:您还可以使用
gawk
并将FPAT
设置为相同的正则表达式来打印第 n 个字段:
s="'first' and 'second' and 'third' and 'fourth' and the rest"
echo "$s" |
gawk -v n=2 'BEGIN{FPAT="[\x27][^\x27]*[\x27]"}
{ gsub(/[\x27]/,"",$n); print $n}'
# second
Or you can use a pipeline of two GNU sed commands with n
being the line you print in the second sed
:或者您可以使用两个 GNU sed 命令的管道,其中
n
是您在第二个sed
中打印的行:
echo "$s" |
gsed -E 's/[^\x27]*\x27([^\x27]*)\x27[^\x27]*/\1\n/g' | gsed -nE '4p'
# fourth
Note:笔记:
[\x27]
is the hex character representation for '
. [\x27]
是'
的十六进制字符表示。 Hex character representations are supported by most regex implementations but not all.大多数正则表达式实现都支持十六进制字符表示,但不是全部。 POSIX
sed
for example is dodgy.例如 POSIX
sed
是狡猾的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.