简体   繁体   English

使用AWK将文件的第一列分成多列

[英]Splitting the first column of a file in multiple columns using AWK

File looks like this, but with millions of lines ( TAB separated): 文件看起来像这样,但是有数百万行(分隔TAB ):

1_number_column_ranking_+   100 200 Target "Hello" 

I want to split the first column by the _ so it becomes: 我想用_分隔第一列,因此它变为:

1 number column ranking + 100 200 Target "Hello"

This is the code I have been trying: 这是我一直在尝试的代码:

awk -F"\t" '{n=split($1,a,"_");for (i=1;i<=n;i++) print $1"\t"a[i]}' 

But it's not quite what I need. 但这不是我所需要的。
Any help is appreciated (the other threads on this topic were not helpful for me). 感谢您的帮助(有关本主题的其他主题对我没有帮助)。

No need to split, just replace would do: 无需拆分,只需替换即可:

awk 'BEGIN{FS=OFS="\t"}{gsub("_","\t",$1)}1'

Eg: 例如:

$ cat file
1_number_column_ranking_+       100     200     Target "Hello"

$ awk 'BEGIN{FS=OFS="\t"}{gsub("_","\t",$1)}1' file
1       number  column  ranking +       100     200     Target "Hello"

gsub will replace all occurances, when no 3rd argument given, it will replace in $0. gsub将替换所有出现的事件,当未给出第三个参数时,它将替换为$ 0。
Last 1 is a shortcut for {print} . 最后1{print}的快捷方式。 (always true , implied {print} .) (始终为true ,暗示{print} 。)

Another awk, if the "_" appears only in the first column. 如果“ _”仅出现在第一列中,则为另一个awk。 Split the input field by regex "[_\\t]+" and just do a dummy operation like $1=$1 in the main section, so that $0 is reconstructed with OFS="\\t" 用正则表达式“ [_ \\ t] +”拆分输入字段,然后在主节中执行类似$ 1 = $ 1的虚拟操​​作,以便使用OFS =“ \\ t”重构$ 0

$ cat steveman.txt
1_number_column_ranking_+       100     200i    Target  "Hello"

$ awk -F"[_\t]" ' BEGIN { OFS="\t"} { $1=$1; print } ' steveman.txt
1       number  column  ranking +       100     200i    Target  "Hello"

$

Thanks @Ed, updated from -F"[_\\t]+" to -F"[_\\t]" that will avoid concatenating empty fields. 感谢@Ed,将其从-F"[_\\t]+"-F"[_\\t]" ,可以避免串联空字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM