简体   繁体   English

使用awk修改文本

[英]Modifying text using awk

I am trying to modify text files using awk. 我正在尝试使用awk修改文本文件。 There are three columns and I want to delete part of the text in the first column: 有三列,我想删除第一列中的部分文本:

range=chr1      20802865        20802871        
range=chr1      23866528        23866534

to

chr1      20802865        20802871        
chr1      23866528        23866534

How can I do this? 我怎样才能做到这一点?

I've tried awk '{ substr("range=chr*", 7) }' and awk '{sub(/[^[:space:]]*\\\\/, "")}1' but it deletes all the contents of the file. 我试过awk '{ substr("range=chr*", 7) }'awk '{sub(/[^[:space:]]*\\\\/, "")}1'但它删除了所有文件的内容。

Set the field separator as = and print the second field: 将字段分隔符设置为=并打印第二个字段:

# With awk                                                                     
$ awk -F= '{print $2}' file
chr1      20802865        20802871        
chr1      23866528        23866534

# Or with cut
$ cut -d= -f2 file                  
chr1      20802865        20802871        
chr1      23866528        23866534

# How about grep
$ grep -Po '(?<==).*' file
chr1      20802865        20802871        
chr1      23866528        23866534

# Temp file needed
$ cut -d= -f2 file > tmp; mv tmp file

Both awk , cut and grep require temporary files if you want to store the changes back into file , a better solution would be to use sed : 如果要将更改存储回fileawkcutgrep都需要临时file ,更好的解决方案是使用sed

 sed -i 's/range=//' file

This substitutes range= with nothing and the -i means the changes are done in-place so no need to handle the temporary files stuff as sed does it for you. 这替换了range=什么都没有, -i意味着更改是就地完成的,所以不需要处理临时文件的东西,就像sed为你做的那样。

It looks like you are using tabs instead of spaces as delimiters in your file, so: 看起来您在文件中使用制表符而不是空格作为分隔符,因此:

awk 'BEGIN{FS="[=\t]"; OFS="\t"} {print $2, $3, $4}' input_file

or 要么

awk 'BEGIN{FS="[=\t]"; OFS="\t"} {$1=""; gsub("\t\t", "\t"); print}' input_file

If you don't need to use awk , you can use sed , which I find a bit simpler. 如果你不需要使用awk ,你可以使用sed ,我觉得它有点简单。 Hopefully you are familiar with regex operators, like ^ and . 希望你熟悉正则表达式运算符,比如^. .

$ cat awkens
range=chr1      20802865        20802871
range=chr1      23866528        23866534
$ sed 's/^range=//' awkens
chr1      20802865        20802871
chr1      23866528        23866534

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM