awk正则表达式提取网址的一部分

Question

I am very new to awk. 我对awk非常陌生。 I have multiple files containing lines similar to 我有多个文件包含与

xyz msg=(1448783938.658:149777):   uid=505   comm="abc.py"   exe="/install/python/bin
abc msg=(1448783938.658:149777):   uid=506   comm="abc.py"   exe="/install/bio/toolx/bin
abc msg=(1448783938.658:149777):   uid=505   comm="abc.py"   exe="/install/bio/tooly/bin

I need two outputs. 我需要两个输出。 a. 一种。 One file containing just the uid and exe column (just after /install/) from across all the files eg 所有文件中仅包含uid和exe列的文件（在/ install /之后）

505 python
506 bio
505 bio

I can print the exe with 我可以用

awk -F '/' '{ print $3}'

but unsure how to print the uid with it. 但不确定如何使用它打印uid。

One file containing uid and exe column containing just string following /bio/ eg 一个文件包含uid和exe列，仅包含/ bio /之后的字符串，例如
506 toolx 506工具
505 tooly 505工具

Any help appreciated. 任何帮助表示赞赏。

Answer 1

You can use the following awk command: 您可以使用以下awk命令：

awk -F'[[:space:]="/]+' '{print $5, $10}' file

I'm using a set of delimiters. 我正在使用一组定界符。 This makes it simple to access the values of interest. 这使得访问感兴趣的值变得简单。 However, it works only if the path contains no <space> , = , " and / . 但是，仅当路径不包含<space> ， = ， "和/ ，它才有效。

Btw, also sed can be used for that, this would work regardless of whatever kind of characters are in the path since no delimiter is used: 顺便说一句，也可以使用sed ，无论路径中使用哪种字符，都可以使用，因为没有使用分隔符：

sed -r 's~.*uid=([^[:space:]]+).*exe="/install/([^/]+).*~\1 \2~' file

Answer 2

similar awk solution 类似的awk解决方案

$ awk -F" +|[=/]" '{print $5,$11}' bio
505 python
506 bio
505 bio

Answer 3

I would keep it simple and use the default field delimiter -- then use sub or split to clean up each field for printing. 我会保持简单，并使用默认的字段定界符-然后使用sub或split清理每个字段以进行打印。 Here is the split solution. 这是拆分解决方案。

awk '{ split($3, uid, "="); split($5, exe, "/"); print uid[2], exe[3] }'

Here's how this was developed: 这是如何开发的：

$ echo 'xyz msg=(1448783938.658:149777):   uid=505   comm="abc.py"   exe="/install/python/bin' | awk '{ print $3, $5 }'
uid=505 exe="/install/python/bin
$ echo 'xyz msg=(1448783938.658:149777):   uid=505   comm="abc.py"   exe="/install/python/bin' | awk '{ split($3, uid, "="); print uid[2], $5 }'
505 exe="/install/python/bin
$ echo 'xyz msg=(1448783938.658:149777):   uid=505   comm="abc.py"   exe="/install/python/bin' | awk '{ split($3, uid, "="); split($5, exe, "/"); print uid[2], exe[3] }'
505 python

I tried a sub based solution first, but that turned out to be both longer and more cryptic than the split based solution -- the split solution seemed more straightforward. 我首先尝试了基于子的解决方案，但是事实证明，与基于拆分的解决方案相比，该解决方案既更长又更隐秘- 拆分解决方案似乎更简单。 (In the case a sub based solution is warranted, perhaps sed would be a better candidate language anyway.) （在需要使用基于子程序的解决方案的情况下，无论如何sed还是一种更好的候选语言。）

One thing that should be added is some filter to make sure we only process valid lines, which could be as simple as the following: 应该添加的一件事是一些过滤器，以确保我们只处理有效的行，它可以像下面这样简单：

awk '$3 ~ /uid=/ && $5 ~ /exe="\/install\// { split($3, uid, "="); split($5, exe, "/"); print uid[2], exe[3] }'

One other thing... If uid and exe fields move around in your file from column to column, you will have to hunt for them using a for loop... which is long enough to turn into a script file like the following: 另一件事...如果uid和exe字段在文件中从一列移到另一列，则您将不得不使用for循环来寻找它们...该循环足够长，可以变成如下所示的脚本文件：

#! /usr/bin/awk -f
{
        u=0
        e=0
        for (i=1; i<=NF; i++) {
                if ($i ~ /uid=/)
                        u=i
                else if ($i ~ /exe="\/install\//)
                        e=i
                if (u && e)
                        break
        }
        if (!u || !e)
                next
        split($u, uid,"=")
        split($e, exe, "/")
        print uid[2], exe[3]
}

In this case, the leading pattern that checks for validity that we put in the immediately preceding example are embedded in the for loop. 在这种情况下，我们在前一个示例中放入的检查有效性的前导模式将嵌入到for循环中。

awk正则表达式提取网址的一部分

问题描述

3 个解决方案

解决方案1
4 已采纳 2015-12-02 19:25:17

解决方案2
1 2015-12-02 19:32:34

解决方案3
1 2015-12-02 20:37:47

awk正则表达式提取网址的一部分

问题描述

3 个解决方案

解决方案1 4 已采纳 2015-12-02 19:25:17

解决方案2 1 2015-12-02 19:32:34

解决方案3 1 2015-12-02 20:37:47

解决方案1
4 已采纳 2015-12-02 19:25:17

解决方案2
1 2015-12-02 19:32:34

解决方案3
1 2015-12-02 20:37:47