I'm postprocessing a very large file which contains many frames. Occasionally there is an empty frame. I would like to remove these. For example,
file.txt
TIMESTEP
101
NUMBER OF ATOMS
3
ATOMS x y z
O 1 2 3
H 2 1 3
C 1 1 2
TIMESTEP
102
NUMBER OF ATOMS
3
ATOMS x y z
TIMESTEP
103
NUMBER OF ATOMS
3
ATOMS x y z
O -1 2 3
H 1 2 3
C 0 1 1
...
I would like to obtain
file.txt
TIMESTEP
101
NUMBER OF ATOMS
3
ATOMS x y z
O 1 2 3
H 2 1 3
C 1 1 2
TIMESTEP
103
NUMBER OF ATOMS
3
ATOMS x y z
O -1 2 3
H 1 2 3
C 0 1 1
...
I've tried
sed '/3.*/{:a;N;N;N;N;/.*NUMBER OF ATOMS$/d;ba}' file.txt
but that would remove also valid frames, which is not what I want. Any pointers and advice is highly appreciated!
This might work for you (GNU sed):
sed -n '/TIMESTEP/!{H;$!d};x;s/\n/&/5p' file
Gather up frames (records) in the hold space and only print them if they are 6 or more lines long.
This gnu awk
may do:
awk -v RS=TIMESTEP 'NF>15 {print RS$0}' file
TIMESTEP
101
NUMBER OF ATOMS
3
ATOMS x y z
O 1 2 3
H 2 1 3
C 1 1 2
TIMESTEP
103
NUMBER OF ATOMS
3
ATOMS x y z
O -1 2 3
H 1 2 3
C 0 1 1
...
By setting record selector to TIMESTEP
it works in block mode with each block start with TIMESTEP
. Then count number of fields (may need to adjust). If its more than 15 (9 should be ok as a minimum), print the block
With GNU sed that would be just:
sed -z 's/TIMESTEP\n[0-9]*\nNUMBER OF ATOMS\n[0-9]*\nATOMS x y z\nTIMESTEP/TIMESTEP/g' file.txt
Without -z
sed option, the following seems to work:
sed -n '
# buffor 6 (not 5!, so one too much) lines into pattern space
N;N;N;N;N
: again
# if pattern space matches empty frame
/^TIMESTEP\n[0-9]*\nNUMBER OF ATOMS\n[0-9]*\nATOMS x y z\nTIMESTEP$/{
# print just the next TIMESTEP
s/.*/TIMESTEP/
p
# start from the top
d
}
# if this is the last line
${
# if last line is an empty frame
/^[^\n]*\nTIMESTEP\n[0-9]*\nNUMBER OF ATOMS\n[0-9]*\nATOMS x y z$/{
# print the line we have too much
P
# and end it
d
}
# print until end of line
p
d
}
# just print and delete one line
P
s/^[^\n]*\n//
# read next line
N
b again
'
with gnu awk
:
awk '{a[i++]=$0}END{ for(i=0;i<NR;)if(a[i]=="TIMESTEP" && a[i+5]=="TIMESTEP") {i=i+5;} else {print a[i]; i=i+1;} }' file
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.