[英]Linux bash: How do I replace a string on a line based on a pattern on another/different line?
I have a file that contains the following data:我有一个包含以下数据的文件:
GS*PO*112233*445566*20211006*155007*2010408*X*004010~
ST*850*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~
ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~
ST*850*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~
For clarity, I have inserted blank line above each ST*850
line.为清楚起见,我在每个
ST*850
行上方插入了空行。 Here is what I want to do:这是我想要做的:
REF*ZZ*SO
REF*ZZ*SO
ST*850
line with ST*850C
ST*850
行替换为ST*850C
So the resultant file would look like this:因此生成的文件将如下所示:
GS*PO*112233*445566*20211006*155007*2010408*X*004010~
ST*850C*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~
ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~
ST*850C*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~
Here is what I have tried:这是我尝试过的:
sed -i -n '/^REF\*ZZ\*SO/!{x;s/ST\*850\*/ST\*850C\*/;x};x;1!p;${x;p}' file
This replaces all the three ST*850
lines with ST*850C
and not just the 1st and the 3rd.这将所有三个
ST*850
线替换为ST*850C
,而不仅仅是第 1 条和第 3 条。 What am I doing wrong?我究竟做错了什么?
How about a perl
solution although perl is not included in the tags.尽管 perl 未包含在标签中,但如何使用
perl
解决方案。
perl -0777 -aF'(?=ST\*850)' -ne '
print map {/REF\*ZZ\*SO/ && s/ST\*850/$&C/; $_} @F;
' file
Output:输出:
GS*PO*112233*445566*20211006*155007*2010408*X*004010~
ST*850C*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~
ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~
ST*850C*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~
-0777
option tells perl
to slurp whole file at once. -0777
选项告诉perl
-0777
整个文件。-a
option enables the auto split
mode then the split fragments are stored in the array @F
. -a
选项启用auto split
模式,然后拆分的片段存储在数组@F
。-F
option specifies the pattern to split the input. -F
选项指定拆分输入的模式。(?=ST\\*850)
is a positive lookbehind which matches at the beginning of a string ST*850
.(?=ST\\*850)
是一个正后视,它匹配字符串ST*850
的开头。-ne
option is mostly equivalent to that of sed
. -ne
选项基本上等同于sed
选项。map {..} @F
function converts all elements of @F
according to the statement within the curly brackets.map {..} @F
函数转换的所有元素@F
根据大括号中的发言。/REF\\*ZZ\\*SO/ && s/ST\\*850/$&C/
is translated as: "if the element of @F matches the pattern /REF*ZZ*SO/, then perform the substitution s/ST*850/$&C/ for the element."/REF\\*ZZ\\*SO/ && s/ST\\*850/$&C/
被翻译为:“如果@F 的元素与模式 /REF*ZZ*SO/ 匹配,则执行替换 s/元素的 ST*850/$&C/。”$_
is the perl's default variable similar to the pattern space
of sed and will be the return values of the map function.$_
是 perl 的默认变量,类似于 sed 的pattern space
,将是 map 函数的返回值。This might work for you (GNU sed):这可能对你有用(GNU sed):
sed '/ST\*850/{:a;/REF\*ZZ\*SO/!{N;ba};s/.*ST\*850/&C/}' file
Begin gathering up lines if a line contains ST*850
.如果一行包含
ST*850
则开始收集行。
On matching a line that contains REF*ZZ*SO
use greed to append C
to the latest ST*850
string.在匹配包含
REF*ZZ*SO
的行时,使用贪婪将C
附加到最新的ST*850
字符串。
NB The regexp .*
ensures that the match will backtrack from the end of the collection rather than the start of the collection.注意正则表达式
.*
确保匹配将从集合的末尾而不是集合的开始回溯。
Assuming ST
is essentially a record separator, you can use a simple Awk script to collect the lines in the current record, and print a modified different one if the conditions are right.假设
ST
本质上是一个记录分隔符,你可以使用一个简单的awk脚本来收集当前记录中的行,如果条件合适,打印一个修改过的不同的。
awk 'BEGIN { ORS = RS = "\nST" }
/REF\*ZZ\*SO/ { sub(/^\*850/, "*<850C") }1' filename
The BEGIN
clause sets the record separator ( RS
) and also the output record separator ( ORS
) to the string ST
preceded by a newline. BEGIN
子句将记录分隔符 ( RS
) 和输出记录分隔符 ( ORS
) 设置为以换行符开头的字符串ST
。 (Attempting to include the asterisk got complicated, so I avoided that.) The final 1
is the common Awk shorthand for "print everything which reaches here". (尝试包含星号很复杂,所以我避免这样做。)最后一个
1
是“打印到达此处的所有内容”的常见 Awk 简写。
sed
is rather unwieldy for anything beyond simple line-based substitutions;除了简单的基于行的替换之外,
sed
对任何事情都相当笨拙; I think you will find that switching to a higher-level language is going to improve maintainability.我想你会发现切换到更高级的语言会提高可维护性。
使用sed
进行预处理以插入换行符,然后将每个块视为一条awk
记录,例如:
sed 's/^ST\*850/\n&/' | awk '/REF\*ZZ\*SO/ { sub(/ST\*850/, "&C") } 1' RS=
The reason why your solution substitutes all occurrences is that you do not append the lines, you only swap back and forth between pattern and hold spaces.您的解决方案替换所有出现的原因是您没有附加行,您只是在模式和保持空间之间来回交换。 What you need is a kind of buffering until one or the other of the special lines are encountered.
您需要的是一种缓冲,直到遇到一个或另一个特殊行。 This is typically done by appending the pattern space to the hold space until a condition is fulfilled.
这通常通过将模式空间附加到保持空间直到满足条件来完成。
With sed
(tested with GNU sed
):使用
sed
(使用 GNU sed
测试):
sed -n '/^ST\*850\*/{x;1!p;b};
/^REF\*ZZ\*SO/{1!{H;x};s/ST\*850\*/ST*850C*/;p;b};
1{h;b};H;${x;p}' file
ST*850*
line, swap pattern and hold spaces.ST*850*
线,则交换模式并保留空格。 Then, if it is not the first line, print.ST*850*
line.ST*850*
行。 The preceding lines that were stored in the hold space, if any, have been printed.REF*ZZ*SO
line, swap pattern and hold spaces and do the substitution.REF*ZZ*SO
行,则交换模式并保留空格并进行替换。 Then, if it is not the first line, print.REF*ZZ*SO
line.REF*ZZ*SO
行。 The preceding lines that were stored in the hold space, if any, have been printed (after modification).Pure Bash: much more verbose but hopefully does not require any additional explanation. Pure Bash:更加冗长,但希望不需要任何额外的解释。
#! /bin/bash
init_chunk()
{
prefix=$1
suffix=$2
chunk=()
refzzso=
}
print_chunk()
{
if [[ ${#chunk[@]} > 0 ]]; then
if [[ $refzzso == true ]]; then
printf '%sC%s\n' "$prefix" "$suffix"
else
printf '%s%s\n' "$prefix" "$suffix"
fi
printf '%s\n' "${chunk[@]}"
fi
}
init_chunk
while read -r line; do
# Check for header.
if [[ $line =~ ^(ST\*850)(.*) ]]; then
# Print previous chunk.
print_chunk
# Begin new chunk.
init_chunk "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}"
continue
fi
# Check if in a chunk.
if [[ $prefix ]]; then
# Check for modifier.
if [[ $line =~ ^REF\*ZZ\*SO ]]; then
refzzso=true
fi
chunk+=("$line")
else
printf '%s\n' "$line"
fi
done
# Print last chunk.
print_chunk
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.