I have a file ff.txt that looks as follows
*ABNA.txt
356
24
36
112
*AC24.txt
457
458
321
2
ABNA.txt and AC24.txt are the files in the folder named foo1. Based on the numbers in the ff.txt file, I want to extract the lines from the corresponding files in the foo1 folder and create the new files with the existing file names in another folder foo2. If the third or fourth column of ABNA.txt file contain 356,24,36,112 numbers, extract that lines and save it to another folder foo2 as ABNA.txt.
ABNA.txt file in the folder foo1 looks as follows
dfg qza 356 245
hjb hkg 455 24
ghf qza 12 123
dfg qza 36 55
AC24.txt file in the folder foo1 looks as follows
hjb hkg 457 167
ghf qza 2 165
sar sar 234 321
dfg qza 345 345
Output:
ABNA.txt file in the folder foo2
dfg qza 356 245
hjb hkg 455 24
dfg qza 36 55
AC24.txt file in the folder foo2
hjb hkg 457 167
ghf qza 2 165
sar sar 234 321
your help would be appreciated!
UPDATED
This is a pure bash
solution ( grep
was removed):
#!/bin/bash
file=
s=()
grp() { r="${s[@]}";r="\b("${r// /|}")\b";
while read w; do [[ $w =~ $r ]] && echo $w;done <foo1/$file >foo2/$file
}
while read a; do
if [[ $a =~ ^\* ]]; then
[ -n "$file" ] && grp
file=${a#\*}
s=()
else s=(${s[@]} $a)
fi
done < ff.txt
[ -n "$file" ] && grp
#See input and output files
for i in foo1/*;{ echo %% in $i; cat $i;}
for i in foo2/*;{ echo %% out $i; cat $i;}
Output
%% in foo1/ABNA.txt
dfg qza 356 245
hjb hkg 455 24
ghf qza 12 123
dfg qza 36 55
%% in foo1/AC24.txt
hjb hkg 457 167
ghf qza 2 165
sar sar 234 321
dfg qza 345 345
%% out foo2/ABNA.txt
dfg qza 356 245
hjb hkg 455 24
dfg qza 36 55
%% out foo2/AC24.txt
hjb hkg 457 167
ghf qza 2 165
sar sar 234 321
In the while-loop it parses the ff.txt
file. If a line starts with *
then the file
environment variable is set. If not starts with *
then it is a number and added to the s
array. If a new filename found and there is an old filename set then it calls the grp
function which does the real work.
The function grp
creates a regex in \\b(num1|num2...)\\b
format. The \\b
is to match only complete numbers. So \\b24\\b
will not match to 245
. The while-loop reads the file from foo1
, matches each line against the regex and writes the file with the same name to directory foo2
. It does not checks if foo2
directory exist.
This might work for you (GNU sed and Bash):
folder1=foo1
folder2=foo2
sed -r '/^\*/!{s/\s*//g;H;$!d};1{h;d};x;s/\n/ /;s/\n/|/g;s#\*(.*) (.*)#<'"$folder1"'/\1 sed -nr '\''/^(\\S+\\s+){2,3}\\b(\2)\\b/w '"$folder2"'/\1'\''#' ff.txt | sh
This turns the ff.txt
file into a script which is piped into the sh
command. The user must first set bash variables $folder1
and $folder2
to the directories containing the source files and the ouput files respectively.
You can try something like this -
awk '
BEGIN {
readpath=sprintf("%s", "/path/to/foo1")
writepath=sprintf("%s", "/path/to/foo2")
}
$0~/\*/ {
file = substr($1,2)
while ((getline var < (readpath"/"file)) > 0) {
split (var, a, " ")
ary[a[3]]=var
ary[a[4]]=var
}
}
($1 in ary) {
print ary[$1] > (writepath"/"file)
}' foo.txt
[jaypal:~/temp/test] ls
foo.txt foo1 foo2
[jaypal:~/temp/test] cat foo.txt
*ABNA.txt
356
24
36
112
*AC24.txt
457
458
321
2
[jaypal:~/temp/test] ls foo1/
ABNA.txt AC24.txt
[jaypal:~/temp/test] head foo1/*
==> foo1/ABNA.txt <==
dfg qza 356 245
hjb hkg 455 24
ghf qza 12 123
dfg qza 36 55
==> foo1/AC24.txt <==
hjb hkg 457 167
ghf qza 2 165
sar sar 234 321
dfg qza 345 345
[jaypal:~/temp/test] ls foo2/
[jaypal:~/temp/test]
[jaypal:~/temp/test] awk '
BEGIN {
readpath=sprintf("%s", "./foo1")
writepath=sprintf("%s", "./foo2")
}
$0~/\*/ {
file = substr($1,2)
while ((getline var < (readpath"/"file)) > 0) {
split (var, a, " ")
ary[a[3]]=var
ary[a[4]]=var
}
}
($1 in ary) {
print ary[$1] > (writepath"/"file)
}' foo.txt
[jaypal:~/temp/test] ls foo2/
ABNA.txt AC24.txt
[jaypal:~/temp/test] head foo2/*
==> foo2/ABNA.txt <==
dfg qza 356 245
hjb hkg 455 24
dfg qza 36 55
==> foo2/AC24.txt <==
hjb hkg 457 167
sar sar 234 321
ghf qza 2 165
#!/bin/bash
mkdir -p foo2
awk '
function process_file(filename, values, filein, fileout, line, f) {
if (filename == "") return
filein = "./foo1/" filename
fileout = "./foo2/" filename
while ((getline line < filein) > 0) {
split(line, f)
if (f[3] in values || f[4] in values) {
print line > fileout
}
}
}
/^\*/ {
process_file(filename, values)
filename = substr($0, 2)
delete values
next
}
{ values[$1] }
END { process_file(filename, values) }
' ff.txt
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.