使用sed / awk明智地修改文本列

Question

I have an input data with three columns (tab separated) like this: 我有一个三列（制表符分隔）的输入数据，如下所示：

  a  mrna_185598_SGL 463
  b  mrna_9210_DLT   463
  c  mrna_9210_IND   463
  d  mrna_9210_INS   463
  e  mrna_9210_SGL   463

How can I use sed/awk to modify it into four columns data that looks like this: 如何使用sed / awk将其修改为四列数据，如下所示：

a  mrna_185598 SGL   463
b  mrna_9210   DLT   463
c  mrna_9210   IND   463
d  mrna_9210   INS   463
e  mrna_9210   SGL   463

In principle I want to split the original "mrna" string into 2 parts. 原则上，我想将原始的“ mrna”字符串分成两部分。

Answer 1

gawk: aw

{
  print $1 "\t" gensub(/_/, "\t", 2, $2) "\t" $3
}

Answer 2

something like this 像这样的东西

awk 'BEGIN{FS=OFS="\t"}{split($2,a,"_"); $2=a[1]"_"a[2]"\t"a[3] }1'  file

output 输出

# ./shell.sh
a       mrna_185598     SGL     463
b       mrna_9210       DLT     463
c       mrna_9210       IND     463
d       mrna_9210       INS     463
e       mrna_9210       SGL     463

use nawk on Solaris 在Solaris上使用nawk

and if you have bash 如果你有重击

while IFS=$'\t' read -r a b c
do
    front=${b%_*}
    back=${b##*_}
    printf "$a\t$front\t$back\t$c\n"
done <"file"

Answer 3

只要它们看起来与您发布的内容没有太大不同：

sed -E 's/mrna_([0-9]+)_/mrna_\1\t/'

Answer 4

you dont need to use sed. 您不需要使用sed。 instead use tr 改用tr

cat *FILENAME* | tr '_[:upper:]{3}\t' '\t[:lower:]{3}\t' >> *FILEOUT*

cat FILENAME will print out the files witch will then be piped ('|') to tr (translate). cat FILENAME将打印出文件，然后将其用管道传输（'|'）到tr（翻译）。 tr will replace anything that has an underscore followed by 3 uppercase characters and then a tab with a tab instead of the underscore. tr将替换所有带有下划线的字符，后跟3个大写字符，然后替换带有选项卡的选项卡，而不是下划线。 Then it will append it to FILEOUT . 然后将其附加到FILEOUT 。

Answer 5

$ cat test.txt
  a  mrna_185598_SGL 463
  b  mrna_9210_DLT   463
  c  mrna_9210_IND   463
  d  mrna_9210_INS   463
  e  mrna_9210_SGL   463

$ cat test.txt | sed -E 's/(\S+)_(\S+)\s+(\S+)$/\1\t\2\t\3/'
  a  mrna_185598    SGL 463
  b  mrna_9210  DLT 463
  c  mrna_9210  IND 463
  d  mrna_9210  INS 463
  e  mrna_9210  SGL 463

Answer 6

gawk '{$1=$1; $0=gensub(/_/,"\t",2);print}' file

a mrna_185598   SGL 463
b mrna_9210 DLT 463
c mrna_9210 IND 463
d mrna_9210 INS 463
e mrna_9210 SGL 463

Answer 7

This might work for you (GNU sed): 这可能对您有用（GNU sed）：

sed 's/_/\t/2' file

Replace the second occurrence of a _ by a tab. 用制表符替换第二次出现的_ 。

使用sed / awk明智地修改文本列

问题描述

7 个解决方案

解决方案1
2 2010-01-28 03:37:21

解决方案2
2 已采纳 2010-01-28 03:38:48

解决方案3
1 2010-01-28 03:40:03

解决方案4
1 2010-01-28 03:49:48

解决方案5
1 2010-01-28 03:50:35

解决方案6
1 2019-07-02 00:28:06

解决方案7
0 2019-07-01 23:20:03

使用sed / awk明智地修改文本列

问题描述

7 个解决方案

解决方案1 2 2010-01-28 03:37:21

解决方案2 2 已采纳 2010-01-28 03:38:48

解决方案3 1 2010-01-28 03:40:03

解决方案4 1 2010-01-28 03:49:48

解决方案5 1 2010-01-28 03:50:35

解决方案6 1 2019-07-02 00:28:06

解决方案7 0 2019-07-01 23:20:03

解决方案1
2 2010-01-28 03:37:21

解决方案2
2 已采纳 2010-01-28 03:38:48

解决方案3
1 2010-01-28 03:40:03

解决方案4
1 2010-01-28 03:49:48

解决方案5
1 2010-01-28 03:50:35

解决方案6
1 2019-07-02 00:28:06

解决方案7
0 2019-07-01 23:20:03