简体   繁体   中英

Remove characters from specific point until before space using sed or awk

I would like to remove characters from a specific point until just before the first whitespace (without removing the whitespace itself). For instance, my file.txt is as follows: -

>DN256845_c2_g1_i1 len=56274 ACGGAGG
>DN256532_c0_g2_i19 len=23973 AATACTC
>DN256979_c8_g3_i32 len=16728 CGAAACT

'X' are numbers such as 1 or 19 or 32 and I would like it to be: -

>DN256845_c2_g1 len=56274 ACGGAGG
>DN256532_c0_g2 len=23973 AATACTC
>DN256979_c8_g3 len=16728 CGAAACT

I had used sed 's/_i.*//' but it removed everything after _i . Other codes that I had tried were sed 's/_i.*\\./\\ /g' , sed -E 's/_i.*+[^[: :]]//g' which ended-up with nothing changed.

How do I solve this using sed/awk or any other approach? I appreciate the help. Thanks!

EDIT: As suggested by Sundeep, I edited the problems for ease of understanding. This data are actually Trinity transcripts identifier. I need to remove the identifier (_i1 and so on) for some analysis).

In awk:

$ awk '{sub(/_[^_ ]+ /," ")}1' file
>DN256845_c2_gXX len=56274 ACGGAGG
>DN256532_c0_gXX len=23973 AATACTC
>DN256979_c8_gXX len=16728 CGAAACT

Same with sed :

$ sed 's/_[^_ ]\+ / /' file

Replace the first instance of an underscore, everything but an underscore or a space and a space with a space.

Edit: I wonder why I didn't post this obvious awk manipulating the end of $1:

$ awk '{sub(/_[^_]+$/,"",$1)}1' file

'X' are numbers such as 1 or 19 or 32

It is a good idea to give sample as close as possible to real use case. I've changed sample data to change X after i to numbers.. if this doesn't help, please add better sample to question

$ cat ip.txt
>DN256845_c2_gXX_i1 len=56274 ACGGAGG
>DN256532_c0_gXX_i19 len=23973 AATACTC
>DN256979_c8_gXX_i32 len=16728 CGAAACT

$ sed 's/_i[0-9]* / /' ip.txt
>DN256845_c2_gXX len=56274 ACGGAGG
>DN256532_c0_gXX len=23973 AATACTC
>DN256979_c8_gXX len=16728 CGAAACT
  • _i[0-9]* match _ followed by zero or more numbers followed by space
  • replace this with space

For this use case, this could also be shortened to

sed 's/_i[^ ]*//' ip.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM