[英]Use RegExp in Awk
I have CSV file: 我有CSV文件:
<iframe src="https://localhost/get/44bc40f3bc04f65b7a35"></iframe>|name_1
<iframe src="https://localhost/get/5db0d477d707121934ff"></iframe>|name_2
<iframe src="https://localhost/get/6c95bd2b32ed45989c61"></iframe>|name_3
<iframe src="https://localhost/get/0a9c4655800e8a7b9ea2"></iframe>|name_4
<iframe src="https://localhost/get/754953b57a32e2841bda"></iframe>|name_5
and want use RegExp and Awk (or Gawk) to make this CSV file like this: 并想使用RegExp和Awk(或Gawk)使此CSV文件如下所示:
44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5
I have worked RegExp in Grep 我在Grep中工作过RegExp
$ grep -Po "[A-Za-z]*+\d++\w++" example.txt
44bc40f3bc04f65b7a35
5db0d477d707121934ff
6c95bd2b32ed45989c61
0a9c4655800e8a7b9ea2
754953b57a32e2841bda
but this RegExp not work in Awk. 但是此RegExp在Awk中不起作用。 I think i'm not correct use regexp in Awk or this type of RegExp not worked in Awk. 我认为我在Awk中使用正则表达式不正确,或者这种类型的RegExp在Awk中不起作用。
$ awk -F "|" 'match($1, /[A-Za-z]*+\d++\w++/, a) {print a[0]"|"$2}' example.txt
db0d477d707121934ff|name_2
bd2b32ed45989c61|name_3
bda|name_5
Just Awk work fine: 只是Awk工作正常:
$ awk -F "|" '{print $1"|"$2}' example.txt
<iframe src="https://localhost/get/44bc40f3bc04f65b7a35"></iframe>|name_1
<iframe src="https://localhost/get/5db0d477d707121934ff"></iframe>|name_2
<iframe src="https://localhost/get/6c95bd2b32ed45989c61"></iframe>|name_3
<iframe src="https://localhost/get/0a9c4655800e8a7b9ea2"></iframe>|name_4
<iframe src="https://localhost/get/754953b57a32e2841bda"></iframe>|name_5
Try: 尝试:
$ awk -F'<iframe src="https://localhost/get/|"></iframe>' '{print $2 $3}' file
44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5
This works by setting the input field separator to be either iframe src="https://localhost/get/
or "></iframe>
in which case the output you want is the second field followed by the third field. 这可以通过将输入字段分隔符设置为 iframe src="https://localhost/get/
或"></iframe>
在这种情况下,所需的输出是第二个字段,然后是第三个字段。
$ awk -F "|" 'match($1, /[[:xdigit:]]{20}/, a) {print a[0]"|"$2}' file
44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5
Note that awk supports POSIX regular expressions. 请注意,awk支持POSIX正则表达式。 That means that it recognizes character classes like [[:digit:]]
or [[:alnum:]]
but not necessarily \\d
or \\w
. 这意味着它可以识别字符类,例如[[:digit:]]
或[[:alnum:]]
但不一定\\d
或\\w
。 As a GNU-specific extension, gawk supports \\w
(but not \\d
). 作为GNU特定的扩展,gawk支持\\w
(但不支持\\d
)。 For portability, stick to the POSIX classes as shown in man 7 regex
. 为了实现可移植性,请遵循man 7 regex
所示的POSIX类。
Your regex [A-Za-z]*+\\d++\\w++
can be translated into awk as follows: 您的正则表达式[A-Za-z]*+\\d++\\w++
可以转换为awk,如下所示:
$ awk -F "|" 'match($1, /[[:alpha:]]*[[:digit:]]+[[:alnum:]]+/, a) {print a[0]"|"$2}' file
44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5
Note that this method requires that the 20-character hex string contains at least one digit. 请注意,此方法要求20个字符的十六进制字符串至少包含一位数字。
The difference between awk and grep invocations in your example is -P option in grep, which stands for "Use Perl regexp". 在您的示例中,awk和grep调用之间的区别是grep中的-P选项,代表“使用Perl regexp”。 If you replace it with -E, it will work just like your awk run. 如果将其替换为-E,它将像执行awk一样工作。 Awk does not support Perl extension. Awk不支持Perl扩展。
Your regexp is better be fixed, I don't think you need these extra + signs, to begin with. 最好将您的正则表达式固定,我认为您不需要这些多余的符号。 If I can assume that you need all letters or digits after get/ then I'd rather write: 如果我可以假设您在get /之后需要所有字母或数字,那么我宁愿写:
awk -F "|" awk -F“ |” 'match($1, /get/([A-Za-z0-9]+)/, a) {print a[1]"|"$2}' example.txt 'match($ 1,/ get /([[A-Za-z0-9] +)/,a){print a [1]“ ||” $ 2}'example.txt
Here we use [A-Za-z0-9]+ match any number of small or upper letters or digits that come after /get, use a[1] to print a matched group inside the parentheses instead of the whole matching pattern a[0] which includes get/ 在这里,我们使用[A-Za-z0-9] +匹配/ get之后的任意数量的小写或大写字母或数字,使用a [1]在括号内打印匹配的组,而不是整个匹配模式a [ 0],其中包括get /
awk '{gsub(/<.*get\//,"")gsub(/".*e>/,"")}1' file
44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5
Here is another solution: 这是另一种解决方案:
awk -F"[/\">|]" 'BEGIN{ OFS = "|" }{ print $6, $11 }' yourfile
With the -F option in the beginning the Field Separator can be /, ", > and |. After that is done you can just print the fields $6 and $11 which contain your desired output together with the output field separator. 开头使用-F选项,字段分隔符可以是/,“,>和|。完成后,您只需打印包含所需输出的$ 6和$ 11字段以及输出字段分隔符即可。
Output: 输出:
44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.