简体   繁体   English

Linux bash:如何根据另一行/不同行上的模式替换一行上的字符串?

[英]Linux bash: How do I replace a string on a line based on a pattern on another/different line?

I have a file that contains the following data:我有一个包含以下数据的文件:

GS*PO*112233*445566*20211006*155007*2010408*X*004010~

ST*850*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~

ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~

ST*850*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~

For clarity, I have inserted blank line above each ST*850 line.为清楚起见,我在每个ST*850行上方插入了空行。 Here is what I want to do:这是我想要做的:

  1. Search for the pattern REF*ZZ*SO搜索模式REF*ZZ*SO
  2. If found, then replace the preceding ST*850 line with ST*850C如果找到,则将前面的ST*850行替换为ST*850C

So the resultant file would look like this:因此生成的文件将如下所示:

GS*PO*112233*445566*20211006*155007*2010408*X*004010~

ST*850C*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~

ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~

ST*850C*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~

Here is what I have tried:这是我尝试过的:

sed -i -n '/^REF\*ZZ\*SO/!{x;s/ST\*850\*/ST\*850C\*/;x};x;1!p;${x;p}' file

This replaces all the three ST*850 lines with ST*850C and not just the 1st and the 3rd.这将所有三个ST*850线替换为ST*850C ,而不仅仅是第 1 条和第 3 条。 What am I doing wrong?我究竟做错了什么?

How about a perl solution although perl is not included in the tags.尽管 perl 未包含在标签中,但如何使用perl解决方案。

perl -0777 -aF'(?=ST\*850)' -ne '
    print map {/REF\*ZZ\*SO/ && s/ST\*850/$&C/; $_} @F;
' file

Output:输出:

GS*PO*112233*445566*20211006*155007*2010408*X*004010~

ST*850C*0001~
BEG*00*DS*A-112233**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*SO168219~
REF*DC*ABC~

ST*850*0002~
BEG*00*DS*A-44556**20211005~
REF*K6*Drop Ship Order~
REF*ZZ*PO54361~

ST*850C*0003~
BEG*00*DS*A-12345**20211005~
REF*K6*Drop Ship Order~
REF*DC*XYZ~
REF*ZZ*SO897654~
  • The -0777 option tells perl to slurp whole file at once. -0777选项告诉perl -0777整个文件。
  • The -a option enables the auto split mode then the split fragments are stored in the array @F . -a选项启用auto split模式,然后拆分的片段存储在数组@F
  • The -F option specifies the pattern to split the input. -F选项指定拆分输入的模式。
  • The regex (?=ST\\*850) is a positive lookbehind which matches at the beginning of a string ST*850 .正则表达式(?=ST\\*850)是一个正后视,它匹配字符串ST*850的开头。
  • The -ne option is mostly equivalent to that of sed . -ne选项基本上等同于sed选项。
  • The map {..} @F function converts all elements of @F according to the statement within the curly brackets.map {..} @F函数转换的所有元素@F根据大括号中的发言。
  • The statement /REF\\*ZZ\\*SO/ && s/ST\\*850/$&C/ is translated as: "if the element of @F matches the pattern /REF*ZZ*SO/, then perform the substitution s/ST*850/$&C/ for the element."语句/REF\\*ZZ\\*SO/ && s/ST\\*850/$&C/被翻译为:“如果@F 的元素与模式 /REF*ZZ*SO/ 匹配,则执行替换 s/元素的 ST*850/$&C/。”
  • The final $_ is the perl's default variable similar to the pattern space of sed and will be the return values of the map function.最后的$_是 perl 的默认变量,类似于 sed 的pattern space ,将是 map 函数的返回值。

This might work for you (GNU sed):这可能对你有用(GNU sed):

sed '/ST\*850/{:a;/REF\*ZZ\*SO/!{N;ba};s/.*ST\*850/&C/}' file

Begin gathering up lines if a line contains ST*850 .如果一行包含ST*850则开始收集行。

On matching a line that contains REF*ZZ*SO use greed to append C to the latest ST*850 string.在匹配包含REF*ZZ*SO的行时,使用贪婪将C附加到最新的ST*850字符串。

NB The regexp .* ensures that the match will backtrack from the end of the collection rather than the start of the collection.注意正则表达式.*确保匹配将从集合的末尾而不是集合的开始回溯。

Assuming ST is essentially a record separator, you can use a simple Awk script to collect the lines in the current record, and print a modified different one if the conditions are right.假设ST本质上是一个记录分隔符,你可以使用一个简单的awk脚本来收集当前记录中的行,如果条件合适,打印一个修改过的不同的。

awk 'BEGIN { ORS = RS = "\nST" }
    /REF\*ZZ\*SO/ { sub(/^\*850/, "*<850C") }1' filename

The BEGIN clause sets the record separator ( RS ) and also the output record separator ( ORS ) to the string ST preceded by a newline. BEGIN子句将记录分隔符 ( RS ) 和输出记录分隔符 ( ORS ) 设置为以换行符开头的字符串ST (Attempting to include the asterisk got complicated, so I avoided that.) The final 1 is the common Awk shorthand for "print everything which reaches here". (尝试包含星号很复杂,所以我避免这样做。)最后一个1是“打印到达此处的所有内容”的常见 Awk 简写。

sed is rather unwieldy for anything beyond simple line-based substitutions;除了简单的基于行的替换之外, sed对任何事情都相当笨拙; I think you will find that switching to a higher-level language is going to improve maintainability.我想你会发现切换到更高级的语言会提高可维护性。

使用sed进行预处理以插入换行符,然后将每个块视为一条awk记录,例如:

sed 's/^ST\*850/\n&/' | awk '/REF\*ZZ\*SO/ { sub(/ST\*850/, "&C") } 1' RS=

The reason why your solution substitutes all occurrences is that you do not append the lines, you only swap back and forth between pattern and hold spaces.您的解决方案替换所有出现的原因是您没有附加行,您只是在模式和保持空间之间来回交换。 What you need is a kind of buffering until one or the other of the special lines are encountered.您需要的是一种缓冲,直到遇到一个或另一个特殊行。 This is typically done by appending the pattern space to the hold space until a condition is fulfilled.这通常通过将模式空间附加到保持空间直到满足条件来完成。

With sed (tested with GNU sed ):使用sed (使用 GNU sed测试):

sed -n '/^ST\*850\*/{x;1!p;b};
        /^REF\*ZZ\*SO/{1!{H;x};s/ST\*850\*/ST*850C*/;p;b};
        1{h;b};H;${x;p}' file
  • If it is a ST*850* line, swap pattern and hold spaces.如果是ST*850*线,则交换模式并保留空格。 Then, if it is not the first line, print.然后,如果它不是第一行,则打印。 Start a new cycle.开始新的循环。 The hold space contains the ST*850* line.保持空间包含ST*850*行。 The preceding lines that were stored in the hold space, if any, have been printed.之前存储在保持空间中的行(如果有)已打印。
  • Else, if it is a REF*ZZ*SO line, swap pattern and hold spaces and do the substitution.否则,如果它是REF*ZZ*SO行,则交换模式并保留空格并进行替换。 Then, if it is not the first line, print.然后,如果它不是第一行,则打印。 Start a new cycle.开始新的循环。 The hold space contains the REF*ZZ*SO line.保持空间包含REF*ZZ*SO行。 The preceding lines that were stored in the hold space, if any, have been printed (after modification).先前存储在保持空间中的行(如果有)已打印(修改后)。
  • Else, if it is the first line, replace the hold space by the pattern space and start a new cycle.否则,如果它是第一行,则用模式空间替换保持空间并开始新的循环。 The hold space thus contains the first line.因此,保持空间包含第一行。
  • Else append the pattern space to the hold space.否则将模式空间附加到保持空间。 If it is the last line swap pattern and hold spaces and print.如果是最后一行交换模式并保持空格并打印。

Pure Bash: much more verbose but hopefully does not require any additional explanation. Pure Bash:更加冗长,但希望不需要任何额外的解释。

#! /bin/bash

init_chunk()
{
  prefix=$1
  suffix=$2
  chunk=()
  refzzso=
}

print_chunk()
{
  if [[ ${#chunk[@]} > 0 ]]; then
    if [[ $refzzso == true ]]; then
      printf '%sC%s\n' "$prefix" "$suffix"
    else
      printf '%s%s\n' "$prefix" "$suffix"
    fi
    printf '%s\n' "${chunk[@]}"
  fi
}

init_chunk
while read -r line; do
  # Check for header.
  if [[ $line =~ ^(ST\*850)(.*) ]]; then
    # Print previous chunk.
    print_chunk
    # Begin new chunk.
    init_chunk "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}"
    continue
  fi
  # Check if in a chunk.
  if [[ $prefix ]]; then
    # Check for modifier.
    if [[ $line =~ ^REF\*ZZ\*SO ]]; then
      refzzso=true
    fi
    chunk+=("$line")
  else
    printf '%s\n' "$line"
  fi
done
# Print last chunk.
print_chunk

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM