在Awk中使用RegExp

Question

I have CSV file: 我有CSV文件：

<iframe src="https://localhost/get/44bc40f3bc04f65b7a35"></iframe>|name_1
<iframe src="https://localhost/get/5db0d477d707121934ff"></iframe>|name_2
<iframe src="https://localhost/get/6c95bd2b32ed45989c61"></iframe>|name_3
<iframe src="https://localhost/get/0a9c4655800e8a7b9ea2"></iframe>|name_4
<iframe src="https://localhost/get/754953b57a32e2841bda"></iframe>|name_5

and want use RegExp and Awk (or Gawk) to make this CSV file like this: 并想使用RegExp和Awk（或Gawk）使此CSV文件如下所示：

44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5

I have worked RegExp in Grep 我在Grep中工作过RegExp

$ grep -Po "[A-Za-z]*+\d++\w++" example.txt 
44bc40f3bc04f65b7a35
5db0d477d707121934ff
6c95bd2b32ed45989c61
0a9c4655800e8a7b9ea2
754953b57a32e2841bda

but this RegExp not work in Awk. 但是此RegExp在Awk中不起作用。 I think i'm not correct use regexp in Awk or this type of RegExp not worked in Awk. 我认为我在Awk中使用正则表达式不正确，或者这种类型的RegExp在Awk中不起作用。

$ awk -F "|" 'match($1, /[A-Za-z]*+\d++\w++/, a) {print a[0]"|"$2}' example.txt 
db0d477d707121934ff|name_2
bd2b32ed45989c61|name_3
bda|name_5

Just Awk work fine: 只是Awk工作正常：

$ awk -F "|" '{print $1"|"$2}' example.txt 
<iframe src="https://localhost/get/44bc40f3bc04f65b7a35"></iframe>|name_1
<iframe src="https://localhost/get/5db0d477d707121934ff"></iframe>|name_2
<iframe src="https://localhost/get/6c95bd2b32ed45989c61"></iframe>|name_3
<iframe src="https://localhost/get/0a9c4655800e8a7b9ea2"></iframe>|name_4
<iframe src="https://localhost/get/754953b57a32e2841bda"></iframe>|name_5

Answer 1

Try: 尝试：

$ awk -F'<iframe src="https://localhost/get/|"></iframe>' '{print $2 $3}' file
44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5

This works by setting the input field separator to be either iframe src="https://localhost/get/ or "></iframe> in which case the output you want is the second field followed by the third field. 这可以通过将输入字段分隔符设置为 iframe src="https://localhost/get/或"></iframe>在这种情况下，所需的输出是第二个字段，然后是第三个字段。

Alternative Method Using Match 使用匹配的替代方法

$ awk -F "|" 'match($1, /[[:xdigit:]]{20}/, a) {print a[0]"|"$2}' file
44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5

Note that awk supports POSIX regular expressions. 请注意，awk支持POSIX正则表达式。 That means that it recognizes character classes like [[:digit:]] or [[:alnum:]] but not necessarily \\d or \\w . 这意味着它可以识别字符类，例如[[:digit:]]或[[:alnum:]]但不一定\\d或\\w 。 As a GNU-specific extension, gawk supports \\w (but not \\d ). 作为GNU特定的扩展，gawk支持\\w （但不支持\\d ）。 For portability, stick to the POSIX classes as shown in man 7 regex . 为了实现可移植性，请遵循man 7 regex所示的POSIX类。

Yet another method 另一种方法

Your regex [A-Za-z]*+\\d++\\w++ can be translated into awk as follows: 您的正则表达式[A-Za-z]*+\\d++\\w++可以转换为awk，如下所示：

$ awk -F "|" 'match($1, /[[:alpha:]]*[[:digit:]]+[[:alnum:]]+/, a) {print a[0]"|"$2}' file
44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5

Note that this method requires that the 20-character hex string contains at least one digit. 请注意，此方法要求20个字符的十六进制字符串至少包含一位数字。

Answer 2

The difference between awk and grep invocations in your example is -P option in grep, which stands for "Use Perl regexp". 在您的示例中，awk和grep调用之间的区别是grep中的-P选项，代表“使用Perl regexp”。 If you replace it with -E, it will work just like your awk run. 如果将其替换为-E，它将像执行awk一样工作。 Awk does not support Perl extension. Awk不支持Perl扩展。
Your regexp is better be fixed, I don't think you need these extra + signs, to begin with. 最好将您的正则表达式固定，我认为您不需要这些多余的符号。 If I can assume that you need all letters or digits after get/ then I'd rather write: 如果我可以假设您在get /之后需要所有字母或数字，那么我宁愿写：
awk -F "|" awk -F“ |” 'match($1, /get/([A-Za-z0-9]+)/, a) {print a[1]"|"$2}' example.txt 'match（$ 1，/ get /（[[A-Za-z0-9] +）/，a）{print a [1]“ ||” $ 2}'example.txt

Here we use [A-Za-z0-9]+ match any number of small or upper letters or digits that come after /get, use a[1] to print a matched group inside the parentheses instead of the whole matching pattern a[0] which includes get/ 在这里，我们使用[A-Za-z0-9] +匹配/ get之后的任意数量的小写或大写字母或数字，使用a [1]在括号内打印匹配的组，而不是整个匹配模式a [ 0]，其中包括get /

Answer 3

awk '{gsub(/<.*get\//,"")gsub(/".*e>/,"")}1'  file

44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5

Answer 4

Here is another solution: 这是另一种解决方案：

awk -F"[/\">|]" 'BEGIN{ OFS = "|" }{ print $6, $11 }' yourfile

With the -F option in the beginning the Field Separator can be /, ", > and |. After that is done you can just print the fields $6 and $11 which contain your desired output together with the output field separator. 开头使用-F选项，字段分隔符可以是/，“，>和|。完成后，您只需打印包含所需输出的$ 6和$ 11字段以及输出字段分隔符即可。

Output: 输出：

44bc40f3bc04f65b7a35|name_1
5db0d477d707121934ff|name_2
6c95bd2b32ed45989c61|name_3
0a9c4655800e8a7b9ea2|name_4
754953b57a32e2841bda|name_5

在Awk中使用RegExp

问题描述

4 个解决方案

解决方案1
4 已采纳 2017-11-13 19:00:50

Alternative Method Using Match 使用匹配的替代方法

Yet another method 另一种方法

解决方案2
0 2017-11-13 19:56:10

解决方案3
0 2017-11-13 20:36:56

解决方案4
0 2017-11-14 09:15:49

在Awk中使用RegExp

问题描述

4 个解决方案

解决方案1 4 已采纳 2017-11-13 19:00:50

Alternative Method Using Match 使用匹配的替代方法

Yet another method 另一种方法

解决方案2 0 2017-11-13 19:56:10

解决方案3 0 2017-11-13 20:36:56

解决方案4 0 2017-11-14 09:15:49

解决方案1
4 已采纳 2017-11-13 19:00:50

解决方案2
0 2017-11-13 19:56:10

解决方案3
0 2017-11-13 20:36:56

解决方案4
0 2017-11-14 09:15:49