简体   繁体   English

如果使用sed或awk行中的字段分隔符数为5,如何在第三个字段之后插入附加字段

[英]How to insert additional field after third field if number of field separators in line is 5 using sed or awk

Is it possible to run a sed command that will check the number of field separators in a line and insert an additional separator if the number of separators in the line is 5, for example? 例如,是否可以运行sed命令来检查一行中的字段分隔符的数目,并在行中分隔符的数目为5时插入一个附加的分隔符?

Source data example: 源数据示例:

a,aaa|bbbb|cccc|dddd|eeee|ffff|gggg
aaaa|bb,bb|dddd|eeee|fff,f|gggg
aaa,a|bbbb|cccc|dddd|eeee|ffff|gggg

Target output example: 目标输出示例:

a,aaa|bbbb|cccc|dddd|eeee|ffff|gggg
aaaa|bb,bb||dddd|eeee|fff,f|gggg
aaa,a|bbbb|cccc|dddd|eeee|ffff|gggg

Note: The target is to insert an additional field separator (|) immediately before or after the second field separator of the line to create a blank 3rd field, if only 5 field separators exist in the line. 注意:如果行中仅存在5个字段分隔符,目标是在该行的第二个字段分隔符之前或之后插入一个附加的字段分隔符(|),以创建空白的第3个字段。

If this is not possible using sed, would awk be able to accomplish the task? 如果使用sed无法做到这一点,awk是否能够完成任务?

Any guidance would be appreciated. 任何指导将不胜感激。

Something like this should work: 这样的事情应该起作用:

awk -F '|' -v OFS='|' 'NF<7{$2=$2 FS} 1'

-F '|' sets the input field separator to | 将输入字段分隔符设置为| .
-v OFS='|' sets the output field separator to | 将输出字段分隔符设置为| .

When the number of fields NF is lower than 7, a field separator FS is appended to the second field. 当场数NF小于7时,场分隔符FS被附加到第二场。

This might work for you (GNU sed): 这可能对您有用(GNU sed):

sed 's/|/&/6;t;s/|/&&/2' file

If the number of field separators (in this case 6) is sufficient, bail out. 如果字段分隔符的数量足够多(在这种情况下为6),请纾困。
Otherwise, double the field separator on the required field (in this case 2). 否则,请在必填字段(在本例中为2)上将字段分隔符加倍。

If you only want to add the separator if there are exactly five, use: 如果只想添加五个分隔符,请使用:

sed 's/|/&/6;t;s/|/&/5;T;s/|/&&/2' file

It is most certainly possible with sed: sed最有可能实现:

sed '/^[^|]*\(|[^|]*\)\{5\}$/s/|/||/2'

The 5 is the number of separators that will trigger replacement, and the 2 at the end of the line is the separator count where replacement will take place. 5是将触发更换的分隔符的数量,行尾的2是将进行替换的分隔符数量。

This is already a bit more readable and a lot more maintainable than my original attempt: 这已经是有点更具可读性和很多比我原来的企图更易于维护:

sed 's/^\([^|]*|[^|]*\)\(\(|[^|]*\)\{4\}\)$/\1|\2/'

Still, the awk solution is the best in terms of readability. 不过,就可读性而言,awk解决方案是最好的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM