简体   繁体   English

多行搜索和替换

[英]Multi-line search and replace

I have an unstructured file and I would like to search and replace pattern of strings. 我有一个非结构化的文件,我想搜索和替换字符串模式。

  • Must replace the string that exists between SELECT and FROM strings; 必须替换SELECT和FROM字符串之间存在的字符串; the one's outside of this pattern should stay as is. 这个模式之外的人应该保持原样。

File format is like 文件格式就像

col4 is required to be upper so
make col4 upper
abc 12345 !$% DATA SELECT 
col1 as col1,
col2 as col2.
col3,
sch.col4 as col4,
sch.tab.col4 as col4_1,
col4,
col5 FROM sch.tab
xyz 34354 ^&* DATA SELECT
col5 as col5,
col3,
col4,
col4 as col4,
col4 FROM
blah blah blah

I want to replace: 我要替换:

  • col4, with upper(col4) as col4, col4,其中upper(col4) as col4,
  • sch.col4 with upper(sch.col4) sch.col4 with upper(sch.col4)
  • sch.tab.col4 with upper(sch.tab.col4) sch.tab.col4upper(sch.tab.col4)
  • col4 (if col4 is at the end of select query) with upper(col4) as col4 col4 (如果col4在选择查询的末尾),其中upper(col4) as col4

The file is on linux server and I tried using sed and awk to narrow down the lines containing col4 but could not move forward from there. 该文件位于linux服务器上,我尝试使用sed和awk缩小包含col4的行,但无法从那里向前移动。

I was able to identify one pattern using below 我可以使用以下方式识别一种模式

awk '/SELECT/,/FROM/' test_file.txt | awk '/col4/{print $0, NR}' | awk -F AS '{print $1}' 

Find the text between SELECT and FROM 在SELECT和FROM之间找到文本
Identify the lines that have col4 识别具有col4的行
print the first field 打印第一个字段

sed -n -e '/SELECT/,/FROM/p' -e 's/\(\([a-zA-Z]\{1,\}\.\)\{0,\}\)col4/upper(\0)/g' test_file.txt

and using sed 并使用sed

Actual: 实际:

col4 is required to be upper so
make col4 upper
abc 12345 !$% DATA SELECT
col1 as col1,
col2 as col2.
col3,
sch.col4 as col4,
sch.tab.col4 as col4_1,
col4,
col5 FROM sch.tab
xyz 34354 ^&* DATA SELECT
col5 as col5,
col3,
col4,
col4 as col4,
col4 FROM
blah blah blah

Expected result: 预期结果:

col4 is required to be upper so
make col4 upper
abc 12345 !$% DATA SELECT
col1 as col1,
col2 as col2.
col3,
upper(sch.col4) as col4,
upper(sch.tab.col4) as col4_1,
upper(col4) as col4,
col5 FROM sch.tab
xyz 34354 ^& DATA SELECT
col5 as col5,
col3,
upper(col4) as col4,
upper(col4) as col4,
upper(col4) as col4 FROM
blah blah blah

Any help is much appreciated!! 任何帮助深表感谢!!

I think this, at least, 95% does it. 我认为至少有95%做到了。 Please tell me if there's an error: 请告诉我是否有错误:

with open('ej.txt', 'r') as file:
    string=file.read().replace('\n',' ')


import re

matches=re.findall(r'SELECT.*?FROM',string)
replacements={"col4,":"upper(col4) as col4,",
             "sch.col4":"upper(sch.col4)",
             "sch.tab.col4":"upper(sch.tab.col4)",
             "col4 as col4,": "upper(col4) as col4,"}
new_matches=[]
for match in matches:
    for k,v in replacements.items():
        match=match.replace(k,v)
    new_matches.append(match)


for k,v in {k:v for k,v in zip(matches,new_matches)}.items() :
    string=string.replace(k,v)

string

Here is a short awk script performing your request: 以下是执行您的请求的简短awk脚本:

awk '/SELECT/,/FROM/ {$0=gensub(/^[^[:space:]]*col4/,"upper(\\0)",-1);}1' input.txt

The output is: 输出为:

abc 12345 !$% DATA SELECT
col1 as col1,
col2 as col2.
col3,
upper(sch.col4) as col4,
upper(sch.tab.col4) as col4_1,
upper(col4),
col5 FROM sch.tab
xyz 34354 ^&* DATA SELECT
col5 as col5,
col3,
upper(col4),
upper(col4) as col4,
upper(col4) FROM
blah blah blah

Explanation: 说明:

/SELECT/,/FROM/ inclusive range selecting each line from /SELECT/ to /FROM/ /SELECT/,/FROM/包含范围,从/ SELECT /到/ FROM /

$0=gensub(***) update current line with substitions from gensub() $0=gensub(***)用gensub()的替换来更新当前行

/^[^[:space:]]*col4/ search for non space prefix to col4 at the beginning of line /^[^[:space:]]*col4/搜索/^[^[:space:]]*col4/的非空格前缀

upper(\\\\0)",-1 replace found-match with-upper('found-match') only first match upper(\\\\0)",-1仅在第一次匹配时才将find-match与upper('found-match')替换

1 print the current line. 1打印当前行。 1 1个

Your description of the transformations you need is incomplete (eg you say you want to change col4, to upper(col4) as col4, but line 7 of the expected output doesn't reflect that) so I set that aside and just wrote this which but will produce the output you want from the input you provided (using GNU awk for the 3rd arg to match()) and hopefully this is what you actually want: 您对所需转换的描述是不完整的(例如,您说您想将col4,更改为col4, upper(col4) as col4,但预期输出的第7行没有反映这一点),所以我将其搁置一旁,然后将其写为但会从您提供的输入中产生您想要的输出(使用GNU awk将第三个arg匹配()),希望这是您真正想要的:

$ cat tst.awk
/SELECT/ { inBlock=1 }
inBlock {
    if ( match($0,/^((sch\.(tab\.)?)?col4\>)( as .*)/,a) ) {
        $0 = "upper(" a[1] ")" a[4]
    }
    else if ( match($0,/^(col4\>)(.*)/,a) ) {
        $0 = "upper(" a[1] ") as " a[1] a[2]
    }
}
/FROM/   { inBlock=0 }
{ print }

$ awk -f tst.awk file
col4 is required to be upper so
make col4 upper
abc 12345 !$% DATA SELECT
col1 as col1,
col2 as col2.
col3,
upper(sch.col4) as col4,
upper(sch.tab.col4) as col4_1,
upper(col4) as col4,
col5 FROM sch.tab
xyz 34354 ^&* DATA SELECT
col5 as col5,
col3,
upper(col4) as col4,
upper(col4) as col4,
upper(col4) as col4 FROM
blah blah blah

With sed: 与sed:

sed '/SELECT/,/FROM/ {s/as col4 *//;s/\([A-Za-z]*\.\)\{0,\}col4/upper(&) as col4/;}' file

Explanations: 说明:

  • s/as col4 *// : existing as col4 is removed to prevent duplicates after second substitution s/as col4 *// :删除存在的as col4 ,以防止第二次替换后重复
  • \\([A-Za-z]*\\.\\)\\{0,\\}col4 : search for 0 or more combinations of letters and dots followed by col4 \\([A-Za-z]*\\.\\)\\{0,\\}col4 :搜索0个或更多字母和点的组合,后跟col4
  • upper(&) as col4/; : replace with new text(matching string is inserted using & ) :用新文本替换(使用&插入匹配字符串)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM