如何使用 awk 或 sed 获得部分匹配之间的所有行？

Question

My file looks something like this:我的文件看起来像这样：

>Cluster 0
0   58aa, >5XX8A... at 91.38%
1   58aa, >3LDMA... at 100.00%
2   58aa, >3BTHI... at 96.55%
3   65aa, >1F7ZI... *
4   58aa, >3LDJA... at 100.00%
>Cluster 1
0   57aa, >1ZJDB... at 94.74%
1   58aa, >1AAPA... at 91.38%
2   56aa, >5NX1D... at 92.86%
>Cluster 2
0   60aa, >4ISLB... at 98.33%
1   62aa, >4ISNB... at 95.16%
>Cluster 3
0   59aa, >3BYBA... *
1   59aa, >5ZJ3A... at 100.00%
2   59aa, >3UIRC... at 100.00%
3   57aa, >3D65I... at 100.00%

How can I use sed or awk to get the IDs after > (for example: 5XX8A) in between the ">Cluster" ones.如何使用 sed 或 awk 在“集群”之间获取 > 之后的 ID（例如：5XX8A）。 I want to be able to save them separately (in different files).我希望能够单独保存它们（在不同的文件中）。 One file per cluster.每个集群一个文件。 Or something more parsable like a single file with the IDs right next to the cluster number.或者更容易解析的东西，比如 ID 就在集群编号旁边的单个文件。

As a first approach doing something like:作为第一种方法，执行以下操作：

sed -n '/^\>/,/^\>/p' filename

returns the whole file:/返回整个文件：/

Answer 1

awk to the rescue! awk来救援！

$ awk '/^>Cluster /{close(f); f="Cluster."$2; next} {sub(/>/,"",$3); print $3 > f}' file
  
$ head Cluster*
==> Cluster.0 <==
5XX8A...
3LDMA...
3BTHI...
1F7ZI...
3LDJA...

==> Cluster.1 <==
1ZJDB...
1AAPA...
5NX1D...

==> Cluster.2 <==
4ISLB...
4ISNB...

==> Cluster.3 <==
3BYBA...
5ZJ3A...
3UIRC...
3D65I...

Answer 2

This might work for you (GNU sed):这可能对您有用（GNU sed）：

sed -En '/^>(Cluster) /{s//>\1./;:a;x;s/\n(.*)/ echo "\1"/e;x;h;d};s/.*>//;s/ .*//;H;$!d;ba' file

Gather up each cluster in the hold space and using the evaluation flag on the substitution command echo the collection to the file name indicated by the first line of the collection.收集保留空间中的每个集群，并使用替换命令上的评估标志将集合回显到集合第一行指示的文件名。

Alternative method, using sed and piping to sh:替代方法，使用 sed 和管道连接到 sh：

sed '/^>Cluster/{s/ /./;h;d};s/..*>//;s/ .*//;G;x;s/>*/>>/;x;s/\n/ /;s/\S*/echo "&"/' file|sh

Alternative method, using sed and csplit:替代方法，使用 sed 和 csplit：

sed 's/^..*>//;s/ .*//' file | csplit -szf Cluster -b '.%d' --suppress-matched - '/>Cluster/' '{*}'

Manipulate the file into the desired format using sed and then split the file into separate files using csplit.使用 sed 将文件处理为所需的格式，然后使用 csplit 将文件拆分为单独的文件。

NB This may not replicate the filenames faithfully.注意这可能不会忠实地复制文件名。

如何使用 awk 或 sed 获得部分匹配之间的所有行？

问题描述

2 个解决方案

解决方案1
3 已采纳 2021-05-14 21:59:45

解决方案2
0 2021-05-17 07:21:04

如何使用 awk 或 sed 获得部分匹配之间的所有行？

问题描述

2 个解决方案

解决方案1 3 已采纳 2021-05-14 21:59:45

解决方案2 0 2021-05-17 07:21:04

解决方案1
3 已采纳 2021-05-14 21:59:45

解决方案2
0 2021-05-17 07:21:04