简体   繁体   中英

Extracting lines from text files in a folder based on the numbers in another file

I have a file ff.txt that looks as follows

*ABNA.txt
 356
 24
 36
 112
*AC24.txt
 457
 458
 321
 2

ABNA.txt and AC24.txt are the files in the folder named foo1. Based on the numbers in the ff.txt file, I want to extract the lines from the corresponding files in the foo1 folder and create the new files with the existing file names in another folder foo2. If the third or fourth column of ABNA.txt file contain 356,24,36,112 numbers, extract that lines and save it to another folder foo2 as ABNA.txt.

ABNA.txt file in the folder foo1 looks as follows

dfg qza 356 245
hjb hkg 455 24
ghf qza 12  123
dfg qza 36  55

AC24.txt file in the folder foo1 looks as follows

hjb hkg 457 167
ghf qza  2  165
sar sar 234 321
dfg qza 345 345

Output:

ABNA.txt file in the folder foo2

dfg qza 356 245
hjb hkg 455 24
dfg qza 36  55

AC24.txt file in the folder foo2

hjb hkg 457 167
ghf qza  2  165
sar sar 234 321

your help would be appreciated!

UPDATED

This is a pure bash solution ( grep was removed):

#!/bin/bash

file=
s=()

grp() { r="${s[@]}";r="\b("${r// /|}")\b";
  while read w; do [[ $w =~ $r ]] && echo $w;done <foo1/$file >foo2/$file
}

while read a; do
  if [[ $a =~ ^\* ]]; then
     [ -n "$file" ] && grp
     file=${a#\*}
     s=()
  else s=(${s[@]} $a)
  fi
done < ff.txt
[ -n "$file" ] && grp

#See input and output files
for i in foo1/*;{ echo %% in $i; cat $i;}
for i in foo2/*;{ echo %% out $i; cat $i;}

Output

%% in foo1/ABNA.txt
dfg qza 356 245
hjb hkg 455 24
ghf qza 12  123
dfg qza 36  55
%% in foo1/AC24.txt
hjb hkg 457 167
ghf qza  2  165
sar sar 234 321
dfg qza 345 345
%% out foo2/ABNA.txt
dfg qza 356 245
hjb hkg 455 24
dfg qza 36  55
%% out foo2/AC24.txt
hjb hkg 457 167
ghf qza  2  165
sar sar 234 321

In the while-loop it parses the ff.txt file. If a line starts with * then the file environment variable is set. If not starts with * then it is a number and added to the s array. If a new filename found and there is an old filename set then it calls the grp function which does the real work.

The function grp creates a regex in \\b(num1|num2...)\\b format. The \\b is to match only complete numbers. So \\b24\\b will not match to 245 . The while-loop reads the file from foo1 , matches each line against the regex and writes the file with the same name to directory foo2 . It does not checks if foo2 directory exist.

This might work for you (GNU sed and Bash):

folder1=foo1
folder2=foo2
sed -r '/^\*/!{s/\s*//g;H;$!d};1{h;d};x;s/\n/ /;s/\n/|/g;s#\*(.*) (.*)#<'"$folder1"'/\1 sed -nr '\''/^(\\S+\\s+){2,3}\\b(\2)\\b/w '"$folder2"'/\1'\''#' ff.txt | sh

This turns the ff.txt file into a script which is piped into the sh command. The user must first set bash variables $folder1 and $folder2 to the directories containing the source files and the ouput files respectively.

You can try something like this -

awk '
BEGIN {
    readpath=sprintf("%s", "/path/to/foo1")
    writepath=sprintf("%s", "/path/to/foo2")
    }
$0~/\*/ {
    file = substr($1,2)
    while ((getline var < (readpath"/"file)) > 0) {
        split (var, a, " ")
        ary[a[3]]=var
        ary[a[4]]=var
        }
    }
($1 in ary) {
    print ary[$1] > (writepath"/"file)
    }' foo.txt

Explaination:

  • Set the read path and write path in BEGIN statement.
  • For lines that has filenames in foo.txt file
  • Use substr to capture the filename in variable called file
  • Read the file in a variable called var .
  • split the variable var to use column 3 and 4 as index to array ary .
  • From foo.txt file if first column is present in the array as index write it to the file.

Test:

[jaypal:~/temp/test] ls
foo.txt foo1    foo2

[jaypal:~/temp/test] cat foo.txt
*ABNA.txt
356
24
36
112
*AC24.txt
457
458
321
2

[jaypal:~/temp/test] ls foo1/
ABNA.txt AC24.txt

[jaypal:~/temp/test] head foo1/*
==> foo1/ABNA.txt <==
dfg qza 356 245
hjb hkg 455 24
ghf qza 12  123
dfg qza 36  55

==> foo1/AC24.txt <==
hjb hkg 457 167
ghf qza  2  165
sar sar 234 321
dfg qza 345 345

[jaypal:~/temp/test] ls foo2/
[jaypal:~/temp/test] 

[jaypal:~/temp/test] awk '
BEGIN {
    readpath=sprintf("%s", "./foo1")
    writepath=sprintf("%s", "./foo2")
    }
$0~/\*/ {
    file = substr($1,2)
    while ((getline var < (readpath"/"file)) > 0) {
        split (var, a, " ")
        ary[a[3]]=var
        ary[a[4]]=var
        }
    }
($1 in ary) {
    print ary[$1] > (writepath"/"file)
    }' foo.txt

[jaypal:~/temp/test] ls foo2/
ABNA.txt AC24.txt

[jaypal:~/temp/test] head foo2/*
==> foo2/ABNA.txt <==
dfg qza 356 245
hjb hkg 455 24
dfg qza 36  55

==> foo2/AC24.txt <==
hjb hkg 457 167
sar sar 234 321
ghf qza  2  165
#!/bin/bash
mkdir -p foo2
awk '
    function process_file(filename, values,     filein, fileout, line, f) {
        if (filename == "") return
        filein = "./foo1/" filename
        fileout = "./foo2/" filename
        while ((getline line < filein) > 0) {
            split(line, f)
            if (f[3] in values || f[4] in values) {
                print line > fileout
            } 
        }
    }

    /^\*/ {
        process_file(filename, values)
        filename = substr($0, 2)
        delete values
        next
    }
    { values[$1] }
    END { process_file(filename, values) }
' ff.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM