简体   繁体   English

根据另一个文件中的数字从文件夹中的文本文件中提取行

[英]Extracting lines from text files in a folder based on the numbers in another file

I have a file ff.txt that looks as follows 我有一个文件ff.txt,如下所示

*ABNA.txt
 356
 24
 36
 112
*AC24.txt
 457
 458
 321
 2

ABNA.txt and AC24.txt are the files in the folder named foo1. ABNA.txt和AC24.txt是名为foo1的文件夹中的文件。 Based on the numbers in the ff.txt file, I want to extract the lines from the corresponding files in the foo1 folder and create the new files with the existing file names in another folder foo2. 根据ff.txt文件中的数字,我想从foo1文件夹中的相应文件中提取行,并使用另一个文件夹foo2中的现有文件名创建新文件。 If the third or fourth column of ABNA.txt file contain 356,24,36,112 numbers, extract that lines and save it to another folder foo2 as ABNA.txt. 如果ABNA.txt文件的第三列或第四列包含356,24,36,112个数字,请提取该行并将其另存为另一个文件夹foo2,如ABNA.txt。

ABNA.txt file in the folder foo1 looks as follows 文件夹foo1中的ABNA.txt文件如下所示

dfg qza 356 245
hjb hkg 455 24
ghf qza 12  123
dfg qza 36  55

AC24.txt file in the folder foo1 looks as follows 文件夹foo1中的AC24.txt文件如下所示

hjb hkg 457 167
ghf qza  2  165
sar sar 234 321
dfg qza 345 345

Output: 输出:

ABNA.txt file in the folder foo2 文件夹foo2中的ABNA.txt文件

dfg qza 356 245
hjb hkg 455 24
dfg qza 36  55

AC24.txt file in the folder foo2 文件夹foo2中的AC24.txt文件

hjb hkg 457 167
ghf qza  2  165
sar sar 234 321

your help would be appreciated! 你的帮助将不胜感激!

UPDATED 更新

This is a pure bash solution ( grep was removed): 这是一个纯粹的bash解决方案( grep被删除):

#!/bin/bash

file=
s=()

grp() { r="${s[@]}";r="\b("${r// /|}")\b";
  while read w; do [[ $w =~ $r ]] && echo $w;done <foo1/$file >foo2/$file
}

while read a; do
  if [[ $a =~ ^\* ]]; then
     [ -n "$file" ] && grp
     file=${a#\*}
     s=()
  else s=(${s[@]} $a)
  fi
done < ff.txt
[ -n "$file" ] && grp

#See input and output files
for i in foo1/*;{ echo %% in $i; cat $i;}
for i in foo2/*;{ echo %% out $i; cat $i;}

Output 产量

%% in foo1/ABNA.txt
dfg qza 356 245
hjb hkg 455 24
ghf qza 12  123
dfg qza 36  55
%% in foo1/AC24.txt
hjb hkg 457 167
ghf qza  2  165
sar sar 234 321
dfg qza 345 345
%% out foo2/ABNA.txt
dfg qza 356 245
hjb hkg 455 24
dfg qza 36  55
%% out foo2/AC24.txt
hjb hkg 457 167
ghf qza  2  165
sar sar 234 321

In the while-loop it parses the ff.txt file. 在while循环中,它解析ff.txt文件。 If a line starts with * then the file environment variable is set. 如果一行以*开头,则设置file环境变量。 If not starts with * then it is a number and added to the s array. 如果不是以*开头,则它是一个数字并添加到s数组中。 If a new filename found and there is an old filename set then it calls the grp function which does the real work. 如果找到一个新的文件名并且有一个旧的文件名设置,那么它会调用grp函数来完成实际工作。

The function grp creates a regex in \\b(num1|num2...)\\b format. 函数grp\\b(num1|num2...)\\b格式创建一个正则表达式。 The \\b is to match only complete numbers. \\b仅匹配完整的数字。 So \\b24\\b will not match to 245 . 所以\\b24\\b245不匹配。 The while-loop reads the file from foo1 , matches each line against the regex and writes the file with the same name to directory foo2 . while循环从foo1读取文件,将每一行与正则表达式匹配,并将具有相同名称的文件写入目录foo2 It does not checks if foo2 directory exist. 它不检查foo2目录是否存在。

This might work for you (GNU sed and Bash): 这可能适合你(GNU sed和Bash):

folder1=foo1
folder2=foo2
sed -r '/^\*/!{s/\s*//g;H;$!d};1{h;d};x;s/\n/ /;s/\n/|/g;s#\*(.*) (.*)#<'"$folder1"'/\1 sed -nr '\''/^(\\S+\\s+){2,3}\\b(\2)\\b/w '"$folder2"'/\1'\''#' ff.txt | sh

This turns the ff.txt file into a script which is piped into the sh command. 这会将ff.txt文件转换为通过管道输入sh命令的脚本。 The user must first set bash variables $folder1 and $folder2 to the directories containing the source files and the ouput files respectively. 用户必须首先将bash变量$folder1$folder2到包含源文件和输出文件的目录中。

You can try something like this - 你可以试试这样的东西 -

awk '
BEGIN {
    readpath=sprintf("%s", "/path/to/foo1")
    writepath=sprintf("%s", "/path/to/foo2")
    }
$0~/\*/ {
    file = substr($1,2)
    while ((getline var < (readpath"/"file)) > 0) {
        split (var, a, " ")
        ary[a[3]]=var
        ary[a[4]]=var
        }
    }
($1 in ary) {
    print ary[$1] > (writepath"/"file)
    }' foo.txt

Explaination: 阐释:

  • Set the read path and write path in BEGIN statement. BEGIN语句中设置读取路径和写入路径。
  • For lines that has filenames in foo.txt file 对于在foo.txt文件中具有文件名的行
  • Use substr to capture the filename in variable called file 使用substr捕获名为file的变量中的文件名
  • Read the file in a variable called var . 在名为var的变量中读取文件。
  • split the variable var to use column 3 and 4 as index to array ary . 变量var 拆分为使用第3列和第4列作为数组ary的索引。
  • From foo.txt file if first column is present in the array as index write it to the file. foo.txt文件中,如果数组中存在第一列作为索引,则将其写入文件。

Test: 测试:

[jaypal:~/temp/test] ls
foo.txt foo1    foo2

[jaypal:~/temp/test] cat foo.txt
*ABNA.txt
356
24
36
112
*AC24.txt
457
458
321
2

[jaypal:~/temp/test] ls foo1/
ABNA.txt AC24.txt

[jaypal:~/temp/test] head foo1/*
==> foo1/ABNA.txt <==
dfg qza 356 245
hjb hkg 455 24
ghf qza 12  123
dfg qza 36  55

==> foo1/AC24.txt <==
hjb hkg 457 167
ghf qza  2  165
sar sar 234 321
dfg qza 345 345

[jaypal:~/temp/test] ls foo2/
[jaypal:~/temp/test] 

[jaypal:~/temp/test] awk '
BEGIN {
    readpath=sprintf("%s", "./foo1")
    writepath=sprintf("%s", "./foo2")
    }
$0~/\*/ {
    file = substr($1,2)
    while ((getline var < (readpath"/"file)) > 0) {
        split (var, a, " ")
        ary[a[3]]=var
        ary[a[4]]=var
        }
    }
($1 in ary) {
    print ary[$1] > (writepath"/"file)
    }' foo.txt

[jaypal:~/temp/test] ls foo2/
ABNA.txt AC24.txt

[jaypal:~/temp/test] head foo2/*
==> foo2/ABNA.txt <==
dfg qza 356 245
hjb hkg 455 24
dfg qza 36  55

==> foo2/AC24.txt <==
hjb hkg 457 167
sar sar 234 321
ghf qza  2  165
#!/bin/bash
mkdir -p foo2
awk '
    function process_file(filename, values,     filein, fileout, line, f) {
        if (filename == "") return
        filein = "./foo1/" filename
        fileout = "./foo2/" filename
        while ((getline line < filein) > 0) {
            split(line, f)
            if (f[3] in values || f[4] in values) {
                print line > fileout
            } 
        }
    }

    /^\*/ {
        process_file(filename, values)
        filename = substr($0, 2)
        delete values
        next
    }
    { values[$1] }
    END { process_file(filename, values) }
' ff.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM