I'm trying to replace blocks of lines like this pattern:
Let's see an example, this input:
01 hello
02 stack
02 overflow
04 hi
02 friends = overflow
03 this
03 is
03 my = is
03 life
02 lol
02 im
02 joking = im
03 filler
Would generate the following ouput (each hello block is one element of an array):
01 hello
02 stack
02 overflow
04 hi
02 lol
02 im
01 hello
02 stack
02 overflow
04 hi
02 lol
02 joking = im
03 filler
01 hello
02 stack
02 friends = overflow
03 this
03 is
03 life
02 lol
02 im
01 hello
02 stack
02 friends = overflow
03 this
03 is
03 life
02 lol
02 joking = im
03 filler
01 hello
02 stack
02 friends = overflow
03 this
03 my = is
03 life
02 lol
02 im
01 hello
02 stack
02 friends = overflow
03 this
03 my = is
03 life
02 lol
02 joking = im
03 filler
I tried it by this way:
#!/bin/bash
awk '{
if ($0~/=/){
level=$1
oc=1
}else if (oc && $1<=level){
oc=0
}
if (!oc){
print
}
}' input.txt
But it only returns the first output that I need, and I don't know how to skip the 03 life
word which are within friends
.
How could I generate these outputs?
I wouldn't mind a python or perl solution if is more confortable to you.
Here is a python script to read the cobol input file and print out all the possible combinations of defined and redefined variables:
#!/usr/bin/python
"""Read cobol file and print all possible redefines."""
import sys
from itertools import product
def readfile(fname):
"""Read cobol file & return a master list of lines and namecount of redefined lines."""
master = []
namecount = {}
with open(fname) as f:
for line in f:
line = line.rstrip(' .\t\n')
if not line:
continue
words = line.split()
n = int(words[0])
if '=' in words or 'REDEFINES' in words:
name = words[3]
else:
name = words[1]
master.append((n, name, line))
namecount[name] = namecount.get(name, 0) + 1
# py2.7: namecount = {key: val for key, val in namecount.items() if val > 1}
namecount = dict((key, val) for key, val in namecount.items() if val > 1)
return master, namecount
def compute(master, skip=None):
"""Return new cobol file given master and skip parameters."""
if skip is None:
skip = {}
seen = {}
skip_to = None
output = ''
for n, name, line in master:
if skip_to and n > skip_to:
continue
seen[name] = seen.get(name, 0) + 1
if seen[name] != skip.get(name, 1):
skip_to = n
continue
skip_to = None
output += line + '\n'
return output
def find_all(master, namecount):
"""Return list of all possible output files given master and namecount."""
keys = namecount.keys()
values = [namecount[k] for k in keys]
out = []
for combo in product(*[range(1, v + 1) for v in values]):
skip = dict(zip(keys, combo))
new = compute(master, skip=skip)
if new not in out:
out.append(new)
return out
def main(argv):
"""Process command line arguments and print results."""
fname = argv[-1]
master, namecount = readfile(fname)
out = find_all(master, namecount)
print('\n'.join(out))
if __name__ == '__main__':
main(sys.argv)
If the above script is save in a file called cobol.py
, then if can be run as:
python cobol.py name_of_input_file
The various possible combinations of defines and redefines will be displayed on stdout.
This script runs under either python2 (2.6+) or python3.
The code uses three functions:
readfile
reads the input file and returns two variables that summarize the structure of what is in it.
compute
takes two parameters and, from them, computes an output block.
find_all
determines all the possible output blocks, uses compute
to create them, and then returns them as a list.
Let's look at each function in more detail:
readfile
readfile
takes the input file name as an argument and returns a list, master
, and a dictionary, namecount
. For every non-empty line in the input file, the list master
has a tuple containing (1) the level number, (2) the name that is defined or redefined, and (2) the original line itself. For the sample input file, readfile
returns this value for master
:
[(1, 'hello', '01 hello'),
(2, 'stack', ' 02 stack'),
(2, 'overflow', ' 02 overflow'),
(4, 'hi', ' 04 hi'),
(2, 'overflow', ' 02 friends = overflow'),
(3, 'this', ' 03 this'),
(3, 'is', ' 03 is'),
(3, 'is', ' 03 my = is'),
(3, 'life', ' 03 life'),
(2, 'lol', ' 02 lol'),
(2, 'im', ' 02 im'),
(2, 'im', ' 02 joking = im'),
(3, 'filler', ' 03 filler')]
readfile
also returns the dictionary namecount
which has an entry for every name that gets redefined and has a count of how many definitions/redefinitions there are for that name. For the sample input file, namecount
has the value:
{'im': 2, 'is': 2, 'overflow': 2}
This indicates that im
, is
, and overflow
each have two possible values.
readfile
was of course designed to work with the input file format in the current version of the question. To the extent possible, it was also designed to work with the formats from the previous versions of this question. For example, variable redefinitions are accepted whether they are signaled with an equal sign (current version) or with the word REFDEFINES
as in previous versions. This is intended to make this script as flexible as possible.
compute
The function compute
is what generates each output block. It uses two parameters. The first is master
which comes directly from readfile
. The second is skip
which is derived from the namecount
dictionary that was returned by readfile
. For example, the namecount
dictionary says that there are two possible definitions for im
. This shows how compute
can be used to generate the output block for each:
In [14]: print compute(master, skip={'im':1, 'is':1, 'overflow':1})
01 hello
02 stack
02 overflow
04 hi
02 lol
02 im
In [15]: print compute(master, skip={'im':2, 'is':1, 'overflow':1})
01 hello
02 stack
02 overflow
04 hi
02 lol
02 joking = im
03 filler
Observe that the first call to compute
above generated the block that uses the first definition of im
and the second call generated the block that uses the second definition.
find_all
With the above two functions available, it is clear that the last step is just to generate all the different combinations of definitions and print them out. That is what the function find_all
does. Using master
and namecount
as returned by readfile
, it systematic runs through all the available combinations of definitions and calls compute
to create a block for each one. It gathers up all the unique blocks that can be created this way and returns them.
The output returned by find_all
is a list of strings. Each strings is the block which corresponds to one combination of defines/redefines. Using the sample input from the question, this shows what find_all
returns:
In [16]: find_all(master, namecount)
Out[16]:
['01 hello\n 02 stack\n 02 overflow\n 04 hi\n 02 lol\n 02 im\n',
'01 hello\n 02 stack\n 02 friends = overflow\n 03 this\n 03 is\n 03 life\n 02 lol\n 02 im\n',
'01 hello\n 02 stack\n 02 overflow\n 04 hi\n 02 lol\n 02 joking = im\n 03 filler\n',
'01 hello\n 02 stack\n 02 friends = overflow\n 03 this\n 03 is\n 03 life\n 02 lol\n 02 joking = im\n 03 filler\n',
'01 hello\n 02 stack\n 02 friends = overflow\n 03 this\n 03 my = is\n 03 life\n 02 lol\n 02 im\n',
'01 hello\n 02 stack\n 02 friends = overflow\n 03 this\n 03 my = is\n 03 life\n 02 lol\n 02 joking = im\n 03 filler\n']
As an example, let's take the fourth string returned by find_all
and, for better format, we will print
it:
In [18]: print find_all(master, namecount)[3]
01 hello
02 stack
02 friends = overflow
03 this
03 is
03 life
02 lol
02 joking = im
03 filler
In the complete script, the output from find_all
is combined together and printed to stdout as follows:
out = find_all(master, namecount)
print('\n'.join(out))
In this way, the output displays all possible blocks.
awk 'f==0 && !/REDEFINES/{s=s"\n"$0;next} /REDEFINES/{f=1;print s t>("output" ++c ".txt");t=""} {t=t"\n"$0} END{print s t>("output" ++c ".txt")}' input
This program has the following variables:
f
is a flag which is zero before the first REDEFINE and one thereafter.
s
contains all the text up to the first REDEFINE.
t
contains the text of the current REDEFINE.
c
is a counter which is used to determine the name of the output name.
The code works as follows:
f==0 && !/REDEFINES/{s=s"\\n"$0;next}
Before the first redefine is encountered, the text is saved in the variable s
and we skip the rest of the commands and jump to the next
line.
/REDEFINES/{f=1;print s t>("output" ++c ".txt");t=""}
Every time that we encounter a REDEFINE line, we set the flag f
to one and print the prolog section s
along with the current REDEFINE section to a file named outputn.txt
where n is replaced by the value of the counter c
.
Because we are at the start of a new REDEFINE section, the variable t
is set to empty.
{t=t"\\n"$0}
Save the current line of this REDEFINE to the variable t
.
END{print s t>("output" ++c ".txt")}
The output file for the last REDEFINE section is printed.
Each of the output files produced by the code above has a leading blank line. The code below removes that via the awk
substr
function:
awk '/REDEFINES/{f=1;print substr(s,2) t>("output" ++c ".txt");t=""} f==0 {s=s"\n"$0;next} {t=t"\n"$0} END{print substr(s,2) t>("output" ++c ".txt")}' input
For variety, this version has slightly different logic but, otherwise, achieves the same result.
awk 'f==1 && pre==$1 && !/REDEFINES/{tail=tail "\n" $0} /REDEFINES/{pre=$1;f=1;t[++c]="\n"$0} f==0 {head=head"\n"$0;next} pre!=$1{t[c]=t[c]"\n"$0} END{for (i=0;i<=c;i++) {print head t[i] tail>("output" (i+1) ".txt")}}' file
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.