linux：从文件中提取模式

Question

I have a big tab delimited .txt file of 4 columns 我有一个4列大的制表符分隔.txt文件

col1    col2    col3    col4
name1   1       2       ens|name1,ccds|name2,ref|name3,ref|name4
name2   3       10      ref|name5,ref|name6
...     ...     ...     ...

Now I want to extract from this file everything that starts with 'ref|'. 现在我想从这个文件中提取以'ref |'开头的所有内容。 This pattern is only present in col4 此模式仅存在于col4中

So for this example I would like to have as output 所以对于这个例子，我希望得到输出

ref|name3
ref|name4
ref|name5
ref|name6

I thought of using 'sed' for this, but I don't know where to start. 我想过为这个使用'sed'，但我不知道从哪里开始。

Answer 1

I think awk is better suited for this task: 我认为awk更适合这项任务：

$ awk  '{for (i=1;i<=NF;i++){if ($i ~ /ref\|/){print $i}}}' FS='( )|(,)' infile
ref|name3
ref|name4
ref|name5
ref|name6

FS='( )|(,)' sets a multile FS to itinerate columns by , and blank spaces , then prints the column when it finds the ref pattern. FS='( )|(,)'设置一个multile FS通过向巡回列,和blank spaces ，那么将输出列当找到ref图案。

Answer 2

Now I want to extract from this file everything that starts with 'ref|'. 现在我想从这个文件中提取以'ref |'开头的所有内容。 This pattern is only present in col4 此模式仅存在于col4中

If you are sure that the pattern only present in col4, you could use grep: 如果你确定模式只出现在col4中，你可以使用grep：

grep -o 'ref|[^,]*' file

output: 输出：

ref|name3
ref|name4
ref|name5
ref|name6

Answer 3

我的一个解决方案是首先使用awk来获取第4列，然后使用sed将逗号转换为换行符，然后使用grep （或awk再次）来获取以ref开头的那些：

awk '{print $4}' < data.txt | sed -e 's/,/\n/g' | grep "^ref"

Answer 4

This might work for you (GNU sed): 这可能适合你（GNU sed）：

sed 's/\(ref|[^,]*\),/\n\1\n/;/^ref/P;D' file

Surround the required strings by newlines and only print those lines that begin with the start of the required string. 通过换行包围所需的字符串，并仅打印以所需字符串的开头开头的那些行。

linux：从文件中提取模式

问题描述

4 个解决方案

解决方案1
5 已采纳 2015-04-27 08:13:27

解决方案2
4 2015-04-27 08:16:29

解决方案3
2 2015-04-27 08:13:28

解决方案4
0 2015-04-27 08:37:43

linux：从文件中提取模式

问题描述

4 个解决方案

解决方案1 5 已采纳 2015-04-27 08:13:27

解决方案2 4 2015-04-27 08:16:29

解决方案3 2 2015-04-27 08:13:28

解决方案4 0 2015-04-27 08:37:43

解决方案1
5 已采纳 2015-04-27 08:13:27

解决方案2
4 2015-04-27 08:16:29

解决方案3
2 2015-04-27 08:13:28

解决方案4
0 2015-04-27 08:37:43