[英]Modifying text column wise with sed/awk
I have an input data with three columns (tab separated) like this: 我有一个三列(制表符分隔)的输入数据,如下所示:
a mrna_185598_SGL 463
b mrna_9210_DLT 463
c mrna_9210_IND 463
d mrna_9210_INS 463
e mrna_9210_SGL 463
How can I use sed/awk to modify it into four columns data that looks like this: 如何使用sed / awk将其修改为四列数据,如下所示:
a mrna_185598 SGL 463
b mrna_9210 DLT 463
c mrna_9210 IND 463
d mrna_9210 INS 463
e mrna_9210 SGL 463
In principle I want to split the original "mrna" string into 2 parts. 原则上,我想将原始的“ mrna”字符串分成两部分。
gawk: aw
{
print $1 "\t" gensub(/_/, "\t", 2, $2) "\t" $3
}
something like this 像这样的东西
awk 'BEGIN{FS=OFS="\t"}{split($2,a,"_"); $2=a[1]"_"a[2]"\t"a[3] }1' file
output 输出
# ./shell.sh
a mrna_185598 SGL 463
b mrna_9210 DLT 463
c mrna_9210 IND 463
d mrna_9210 INS 463
e mrna_9210 SGL 463
use nawk on Solaris 在Solaris上使用nawk
and if you have bash 如果你有重击
while IFS=$'\t' read -r a b c
do
front=${b%_*}
back=${b##*_}
printf "$a\t$front\t$back\t$c\n"
done <"file"
只要它们看起来与您发布的内容没有太大不同:
sed -E 's/mrna_([0-9]+)_/mrna_\1\t/'
you dont need to use sed. 您不需要使用sed。 instead use tr 改用tr
cat *FILENAME* | tr '_[:upper:]{3}\t' '\t[:lower:]{3}\t' >> *FILEOUT*
cat FILENAME will print out the files witch will then be piped ('|') to tr (translate). cat FILENAME将打印出文件,然后将其用管道传输('|')到tr(翻译)。 tr will replace anything that has an underscore followed by 3 uppercase characters and then a tab with a tab instead of the underscore. tr将替换所有带有下划线的字符,后跟3个大写字符,然后替换带有选项卡的选项卡,而不是下划线。 Then it will append it to FILEOUT . 然后将其附加到FILEOUT 。
$ cat test.txt
a mrna_185598_SGL 463
b mrna_9210_DLT 463
c mrna_9210_IND 463
d mrna_9210_INS 463
e mrna_9210_SGL 463
$ cat test.txt | sed -E 's/(\S+)_(\S+)\s+(\S+)$/\1\t\2\t\3/'
a mrna_185598 SGL 463
b mrna_9210 DLT 463
c mrna_9210 IND 463
d mrna_9210 INS 463
e mrna_9210 SGL 463
gawk '{$1=$1; $0=gensub(/_/,"\t",2);print}' file
a mrna_185598 SGL 463
b mrna_9210 DLT 463
c mrna_9210 IND 463
d mrna_9210 INS 463
e mrna_9210 SGL 463
This might work for you (GNU sed): 这可能对您有用(GNU sed):
sed 's/_/\t/2' file
Replace the second occurrence of a _
by a tab. 用制表符替换第二次出现的_
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.