使用sed或awk从特定点删除字符，直到空格为止

Question

I would like to remove characters from a specific point until just before the first whitespace (without removing the whitespace itself). 我想从特定的点删除字符，直到第一个空格之前（不删除空格本身）。 For instance, my file.txt is as follows: - 例如，我的file.txt如下：-

>DN256845_c2_g1_i1 len=56274 ACGGAGG
>DN256532_c0_g2_i19 len=23973 AATACTC
>DN256979_c8_g3_i32 len=16728 CGAAACT

'X' are numbers such as 1 or 19 or 32 and I would like it to be: - “ X”是数字，例如1或19或32，我希望它是：-

>DN256845_c2_g1 len=56274 ACGGAGG
>DN256532_c0_g2 len=23973 AATACTC
>DN256979_c8_g3 len=16728 CGAAACT

I had used sed 's/_i.*//' but it removed everything after _i . 我曾经使用sed 's/_i.*//'但是它删除了_i之后的所有内容。 Other codes that I had tried were sed 's/_i.*\\./\\ /g' , sed -E 's/_i.*+[^[: :]]//g' which ended-up with nothing changed. 我尝试过的其他代码是sed 's/_i.*\\./\\ /g' ， sed -E 's/_i.*+[^[: :]]//g'最终没有改变。

How do I solve this using sed/awk or any other approach? 如何使用sed / awk或任何其他方法解决此问题？ I appreciate the help. 感谢您的帮助。 Thanks! 谢谢！

EDIT: As suggested by Sundeep, I edited the problems for ease of understanding. 编辑：根据Sundeep的建议，我对问题进行了编辑以便于理解。 This data are actually Trinity transcripts identifier. 此数据实际上是Trinity成绩单标识符。 I need to remove the identifier (_i1 and so on) for some analysis). 我需要删除标识符（_i1等）以进行一些分析）。

Answer 1

In awk: 在awk中：

$ awk '{sub(/_[^_ ]+ /," ")}1' file
>DN256845_c2_gXX len=56274 ACGGAGG
>DN256532_c0_gXX len=23973 AATACTC
>DN256979_c8_gXX len=16728 CGAAACT

Same with sed : 与sed相同：

$ sed 's/_[^_ ]\+ / /' file

Replace the first instance of an underscore, everything but an underscore or a space and a space with a space. 替换下划线的第一个实例，除下划线或空格和带空格的空格外的所有内容。

Edit: I wonder why I didn't post this obvious awk manipulating the end of $1: 编辑：我想知道为什么我没有发布这个明显的awk操纵$ 1的结尾：

$ awk '{sub(/_[^_]+$/,"",$1)}1' file

Answer 2

'X' are numbers such as 1 or 19 or 32 “ X”是数字，例如1或19或32

It is a good idea to give sample as close as possible to real use case. 尽可能使样本接近实际用例是一个好主意。 I've changed sample data to change X after i to numbers.. if this doesn't help, please add better sample to question 我更改了样本数据，将i更改为数字X如果这没有帮助，请向问题中添加更好的样本

$ cat ip.txt
>DN256845_c2_gXX_i1 len=56274 ACGGAGG
>DN256532_c0_gXX_i19 len=23973 AATACTC
>DN256979_c8_gXX_i32 len=16728 CGAAACT

$ sed 's/_i[0-9]* / /' ip.txt
>DN256845_c2_gXX len=56274 ACGGAGG
>DN256532_c0_gXX len=23973 AATACTC
>DN256979_c8_gXX len=16728 CGAAACT

_i[0-9]* match _ followed by zero or more numbers followed by space _i[0-9]*匹配_后跟零个或多个数字，后跟空格
replace this with space 用空间代替

For this use case, this could also be shortened to 对于此用例，也可以缩短为

sed 's/_i[^ ]*//' ip.txt

使用sed或awk从特定点删除字符，直到空格为止

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-04-12 08:13:01

解决方案2
1 2018-04-12 08:18:58

使用sed或awk从特定点删除字符，直到空格为止

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-04-12 08:13:01

解决方案2 1 2018-04-12 08:18:58

解决方案1
2 已采纳 2018-04-12 08:13:01

解决方案2
1 2018-04-12 08:18:58