[英]Multi-line search and replace
I have an unstructured file and I would like to search and replace pattern of strings. 我有一个非结构化的文件,我想搜索和替换字符串模式。
File format is like 文件格式就像
col4 is required to be upper so
make col4 upper
abc 12345 !$% DATA SELECT
col1 as col1,
col2 as col2.
col3,
sch.col4 as col4,
sch.tab.col4 as col4_1,
col4,
col5 FROM sch.tab
xyz 34354 ^&* DATA SELECT
col5 as col5,
col3,
col4,
col4 as col4,
col4 FROM
blah blah blah
I want to replace: 我要替换:
col4,
with upper(col4) as col4,
col4,
其中upper(col4) as col4,
sch.col4
with upper(sch.col4)
sch.col4
with upper(sch.col4)
sch.tab.col4
with upper(sch.tab.col4)
sch.tab.col4
与upper(sch.tab.col4)
col4
(if col4 is at the end of select query) with upper(col4) as col4
col4
(如果col4在选择查询的末尾),其中upper(col4) as col4
The file is on linux server and I tried using sed and awk to narrow down the lines containing col4 but could not move forward from there. 该文件位于linux服务器上,我尝试使用sed和awk缩小包含col4的行,但无法从那里向前移动。
I was able to identify one pattern using below 我可以使用以下方式识别一种模式
awk '/SELECT/,/FROM/' test_file.txt | awk '/col4/{print $0, NR}' | awk -F AS '{print $1}'
Find the text between SELECT and FROM 在SELECT和FROM之间找到文本
Identify the lines that have col4 识别具有col4的行
print the first field 打印第一个字段
sed -n -e '/SELECT/,/FROM/p' -e 's/\(\([a-zA-Z]\{1,\}\.\)\{0,\}\)col4/upper(\0)/g' test_file.txt
and using sed 并使用sed
Actual: 实际:
col4 is required to be upper so
make col4 upper
abc 12345 !$% DATA SELECT
col1 as col1,
col2 as col2.
col3,
sch.col4 as col4,
sch.tab.col4 as col4_1,
col4,
col5 FROM sch.tab
xyz 34354 ^&* DATA SELECT
col5 as col5,
col3,
col4,
col4 as col4,
col4 FROM
blah blah blah
Expected result: 预期结果:
col4 is required to be upper so
make col4 upper
abc 12345 !$% DATA SELECT
col1 as col1,
col2 as col2.
col3,
upper(sch.col4) as col4,
upper(sch.tab.col4) as col4_1,
upper(col4) as col4,
col5 FROM sch.tab
xyz 34354 ^& DATA SELECT
col5 as col5,
col3,
upper(col4) as col4,
upper(col4) as col4,
upper(col4) as col4 FROM
blah blah blah
Any help is much appreciated!! 任何帮助深表感谢!!
I think this, at least, 95% does it. 我认为至少有95%做到了。 Please tell me if there's an error: 请告诉我是否有错误:
with open('ej.txt', 'r') as file:
string=file.read().replace('\n',' ')
import re
matches=re.findall(r'SELECT.*?FROM',string)
replacements={"col4,":"upper(col4) as col4,",
"sch.col4":"upper(sch.col4)",
"sch.tab.col4":"upper(sch.tab.col4)",
"col4 as col4,": "upper(col4) as col4,"}
new_matches=[]
for match in matches:
for k,v in replacements.items():
match=match.replace(k,v)
new_matches.append(match)
for k,v in {k:v for k,v in zip(matches,new_matches)}.items() :
string=string.replace(k,v)
string
Here is a short awk script performing your request: 以下是执行您的请求的简短awk脚本:
awk '/SELECT/,/FROM/ {$0=gensub(/^[^[:space:]]*col4/,"upper(\\0)",-1);}1' input.txt
abc 12345 !$% DATA SELECT
col1 as col1,
col2 as col2.
col3,
upper(sch.col4) as col4,
upper(sch.tab.col4) as col4_1,
upper(col4),
col5 FROM sch.tab
xyz 34354 ^&* DATA SELECT
col5 as col5,
col3,
upper(col4),
upper(col4) as col4,
upper(col4) FROM
blah blah blah
/SELECT/,/FROM/
inclusive range selecting each line from /SELECT/ to /FROM/ /SELECT/,/FROM/
包含范围,从/ SELECT /到/ FROM /
$0=gensub(***)
update current line with substitions from gensub() $0=gensub(***)
用gensub()的替换来更新当前行
/^[^[:space:]]*col4/
search for non space prefix to col4 at the beginning of line /^[^[:space:]]*col4/
搜索/^[^[:space:]]*col4/
的非空格前缀
upper(\\\\0)",-1
replace found-match with-upper('found-match') only first match upper(\\\\0)",-1
仅在第一次匹配时才将find-match与upper('found-match')替换
1
print the current line. 1
打印当前行。 1 1个
Your description of the transformations you need is incomplete (eg you say you want to change col4,
to upper(col4) as col4,
but line 7 of the expected output doesn't reflect that) so I set that aside and just wrote this which but will produce the output you want from the input you provided (using GNU awk for the 3rd arg to match()) and hopefully this is what you actually want: 您对所需转换的描述是不完整的(例如,您说您想将col4,
更改为col4,
upper(col4) as col4,
但预期输出的第7行没有反映这一点),所以我将其搁置一旁,然后将其写为但会从您提供的输入中产生您想要的输出(使用GNU awk将第三个arg匹配()),希望这是您真正想要的:
$ cat tst.awk
/SELECT/ { inBlock=1 }
inBlock {
if ( match($0,/^((sch\.(tab\.)?)?col4\>)( as .*)/,a) ) {
$0 = "upper(" a[1] ")" a[4]
}
else if ( match($0,/^(col4\>)(.*)/,a) ) {
$0 = "upper(" a[1] ") as " a[1] a[2]
}
}
/FROM/ { inBlock=0 }
{ print }
$ awk -f tst.awk file
col4 is required to be upper so
make col4 upper
abc 12345 !$% DATA SELECT
col1 as col1,
col2 as col2.
col3,
upper(sch.col4) as col4,
upper(sch.tab.col4) as col4_1,
upper(col4) as col4,
col5 FROM sch.tab
xyz 34354 ^&* DATA SELECT
col5 as col5,
col3,
upper(col4) as col4,
upper(col4) as col4,
upper(col4) as col4 FROM
blah blah blah
With sed: 与sed:
sed '/SELECT/,/FROM/ {s/as col4 *//;s/\([A-Za-z]*\.\)\{0,\}col4/upper(&) as col4/;}' file
Explanations: 说明:
s/as col4 *//
: existing as col4
is removed to prevent duplicates after second substitution s/as col4 *//
:删除存在的as col4
,以防止第二次替换后重复 \\([A-Za-z]*\\.\\)\\{0,\\}col4
: search for 0 or more combinations of letters and dots followed by col4
\\([A-Za-z]*\\.\\)\\{0,\\}col4
:搜索0个或更多字母和点的组合,后跟col4
upper(&) as col4/;
: replace with new text(matching string is inserted using &
) :用新文本替换(使用&
插入匹配字符串)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.