简体   繁体   English

在 shell 脚本中使用 sed 命令为 substring 并替换 position 到需要

[英]Using sed command in shell script for substring and replace position to need

I'm dealing data on text file and I can't find a way with sed to select a substring at a fixed position and replace it.我正在处理文本文件上的数据,但我找不到从 sed 到 select a substring 固定 position 的方法并替换它。

This is what I have:这就是我所拥有的:

X|001200000000000000000098765432|1234567890|TQ

This is what I need:这就是我需要的:

‘X’,’00000098765432’,’1234567890’,’TQ’

The following code in sed gives the substring I need (00000098765432) but not overwrites position to need sed 中的以下代码给出了我需要的 substring (00000098765432) 但不会覆盖 position 需要

echo “ X|001200000000000000000098765432|1234567890|TQ” | sed “s/
*//g;s/|/‘,’/g;s/^/‘/;s/$/‘/“

Could you help me?你可以帮帮我吗?

Rather than sed , I would use awk for this.我会为此使用awk而不是sed

echo "X|001200000000000000000098765432|1234567890|TQ" | awk 'BEGIN {FS="|";OFS=","} {print $1,substr($2,17,14),$3,$4}'

Gives output:给出 output:

X,00000098765432,1234567890,TQ

Here is how it works:下面是它的工作原理:

FS = Field separator (in the input) FS = 字段分隔符(在输入中)

OFS = Output field separator (the way you want output to be delimited) OFS = Output 字段分隔符(你希望output分隔的方式)

BEGIN -> think of it as the place where configurations are set. BEGIN -> 将其视为设置配置的地方。 It runs only one time.它只运行一次。 So you are saying you want output to be comma delimited and input is pipe delimited.所以你说你想要 output 以逗号分隔,输入是 pipe 分隔。

substr($2,17,14) -> Take $2 (ie second field - awk begins counting from 1 - and then apply substring on it. 17 means the beginning character position and 14 means the number of characters from that position onwards) substr($2,17,14) -> 取 $2(即第二个字段 - awk 从 1 开始计数 - 然后在其上应用 substring。17 表示起始字符 position,14 表示从 position 开始的字符数)

In my opinion, this is much more readable and maintainable than sed version you have.在我看来,这比您拥有的 sed 版本更具可读性和可维护性。

If you want to put the quotes in, I'd still use awk .如果你想加上引号,我仍然会使用awk

$: awk -F'|' 'BEGIN{q="\047"} {print  q $1 q","q substr($2,17,14) q","q $3 q","q $4 q"\n"}' <<< "X|001200000000000000000098765432|1234567890|TQ"
'X','00000098765432','1234567890','TQ'

If you just want to use sed , note that you say above you want to remove 16 characters, but you are actually only removing 14.如果您只想使用sed ,请注意您在上面要删除 16 个字符,但实际上您只删除了 14 个。

$: sed -E "s/^(.)[|].{14}([^|]+)[|]([^|]+)[|]([^|]+)/'\1','\2','\3','\4'/" <<< "X|0012000000000000000098765432|1234567890|TQ"
'X','00000098765432','1234567890','TQ'

Using sed使用sed

$ sed "s/|\(0[0-9]\{15\}\)\?/','/g;s/^\|$/'/g" input_file
'X','00000098765432','1234567890','TQ'
awk -v del1="\047" \
    -v del2="," \
    -v start="3" \
    -v len="17" \
    '{
         gsub(substr($0,start+1,len),"");
         gsub(/[\|]/,del1 del2 del1);
         print del1$0del1
    }' input_file

'X',00000098765432','1234567890','TQ'

Using any POSIX awk:使用任何 POSIX awk:

$ echo 'X|001200000000000000000098765432|1234567890|TQ' |
awk -F'|' -v OFS="','" -v q="'" '{sub(/.{16}/,"",$2); print q $0 q}'
'X','00000098765432','1234567890','TQ'

not as elegant as I hoped for, but it gets the job done:没有我希望的那么优雅,但它完成了工作:

'X','00000098765432','1234567890','TQ'

    # gawk profile, created Mon May  9 21:19:17 2022
    # BEGIN rule(s)

    'BEGIN {
     1     _ = sprintf("%*s", (__ = +2)^++__+--__*++__,__--)
     1            gsub(".", "[0-9]", _)
     1             sub("$",     "$", _)
     1    FS = "[|]"
     1   OFS = "\47,\47"
    }

    # Rule(s)

     1     (NF *= NF == __*__) * sub(_,  "|&",   $__) * \
        sub("^.*[|]", "", $__) * sub(".+", "\47&\47")    }'

Tested and confirmed working on gnu gawk 5.1.1 , mawk 1.3.4 , mawk 1.9.9.6 , and macosx nawk测试并确认在gnu gawk 5.1.1mawk 1.3.4mawk 1.9.9.6macosx nawk上工作

The 4Chan Teller The 4Chan Teller

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM