[英]Extracting lines from text files in a folder based on the numbers in another file
I have a file ff.txt that looks as follows 我有一个文件ff.txt,如下所示
*ABNA.txt
356
24
36
112
*AC24.txt
457
458
321
2
ABNA.txt and AC24.txt are the files in the folder named foo1. ABNA.txt和AC24.txt是名为foo1的文件夹中的文件。 Based on the numbers in the ff.txt file, I want to extract the lines from the corresponding files in the foo1 folder and create the new files with the existing file names in another folder foo2.
根据ff.txt文件中的数字,我想从foo1文件夹中的相应文件中提取行,并使用另一个文件夹foo2中的现有文件名创建新文件。 If the third or fourth column of ABNA.txt file contain 356,24,36,112 numbers, extract that lines and save it to another folder foo2 as ABNA.txt.
如果ABNA.txt文件的第三列或第四列包含356,24,36,112个数字,请提取该行并将其另存为另一个文件夹foo2,如ABNA.txt。
ABNA.txt file in the folder foo1 looks as follows 文件夹foo1中的ABNA.txt文件如下所示
dfg qza 356 245
hjb hkg 455 24
ghf qza 12 123
dfg qza 36 55
AC24.txt file in the folder foo1 looks as follows 文件夹foo1中的AC24.txt文件如下所示
hjb hkg 457 167
ghf qza 2 165
sar sar 234 321
dfg qza 345 345
Output: 输出:
ABNA.txt file in the folder foo2 文件夹foo2中的ABNA.txt文件
dfg qza 356 245
hjb hkg 455 24
dfg qza 36 55
AC24.txt file in the folder foo2 文件夹foo2中的AC24.txt文件
hjb hkg 457 167
ghf qza 2 165
sar sar 234 321
your help would be appreciated! 你的帮助将不胜感激!
UPDATED 更新
This is a pure bash
solution ( grep
was removed): 这是一个纯粹的
bash
解决方案( grep
被删除):
#!/bin/bash
file=
s=()
grp() { r="${s[@]}";r="\b("${r// /|}")\b";
while read w; do [[ $w =~ $r ]] && echo $w;done <foo1/$file >foo2/$file
}
while read a; do
if [[ $a =~ ^\* ]]; then
[ -n "$file" ] && grp
file=${a#\*}
s=()
else s=(${s[@]} $a)
fi
done < ff.txt
[ -n "$file" ] && grp
#See input and output files
for i in foo1/*;{ echo %% in $i; cat $i;}
for i in foo2/*;{ echo %% out $i; cat $i;}
Output 产量
%% in foo1/ABNA.txt
dfg qza 356 245
hjb hkg 455 24
ghf qza 12 123
dfg qza 36 55
%% in foo1/AC24.txt
hjb hkg 457 167
ghf qza 2 165
sar sar 234 321
dfg qza 345 345
%% out foo2/ABNA.txt
dfg qza 356 245
hjb hkg 455 24
dfg qza 36 55
%% out foo2/AC24.txt
hjb hkg 457 167
ghf qza 2 165
sar sar 234 321
In the while-loop it parses the ff.txt
file. 在while循环中,它解析
ff.txt
文件。 If a line starts with *
then the file
environment variable is set. 如果一行以
*
开头,则设置file
环境变量。 If not starts with *
then it is a number and added to the s
array. 如果不是以
*
开头,则它是一个数字并添加到s
数组中。 If a new filename found and there is an old filename set then it calls the grp
function which does the real work. 如果找到一个新的文件名并且有一个旧的文件名设置,那么它会调用
grp
函数来完成实际工作。
The function grp
creates a regex in \\b(num1|num2...)\\b
format. 函数
grp
以\\b(num1|num2...)\\b
格式创建一个正则表达式。 The \\b
is to match only complete numbers. \\b
仅匹配完整的数字。 So \\b24\\b
will not match to 245
. 所以
\\b24\\b
与245
不匹配。 The while-loop reads the file from foo1
, matches each line against the regex and writes the file with the same name to directory foo2
. while循环从
foo1
读取文件,将每一行与正则表达式匹配,并将具有相同名称的文件写入目录foo2
。 It does not checks if foo2
directory exist. 它不检查
foo2
目录是否存在。
This might work for you (GNU sed and Bash): 这可能适合你(GNU sed和Bash):
folder1=foo1
folder2=foo2
sed -r '/^\*/!{s/\s*//g;H;$!d};1{h;d};x;s/\n/ /;s/\n/|/g;s#\*(.*) (.*)#<'"$folder1"'/\1 sed -nr '\''/^(\\S+\\s+){2,3}\\b(\2)\\b/w '"$folder2"'/\1'\''#' ff.txt | sh
This turns the ff.txt
file into a script which is piped into the sh
command. 这会将
ff.txt
文件转换为通过管道输入sh
命令的脚本。 The user must first set bash variables $folder1
and $folder2
to the directories containing the source files and the ouput files respectively. 用户必须首先将bash变量
$folder1
和$folder2
到包含源文件和输出文件的目录中。
You can try something like this - 你可以试试这样的东西 -
awk '
BEGIN {
readpath=sprintf("%s", "/path/to/foo1")
writepath=sprintf("%s", "/path/to/foo2")
}
$0~/\*/ {
file = substr($1,2)
while ((getline var < (readpath"/"file)) > 0) {
split (var, a, " ")
ary[a[3]]=var
ary[a[4]]=var
}
}
($1 in ary) {
print ary[$1] > (writepath"/"file)
}' foo.txt
[jaypal:~/temp/test] ls
foo.txt foo1 foo2
[jaypal:~/temp/test] cat foo.txt
*ABNA.txt
356
24
36
112
*AC24.txt
457
458
321
2
[jaypal:~/temp/test] ls foo1/
ABNA.txt AC24.txt
[jaypal:~/temp/test] head foo1/*
==> foo1/ABNA.txt <==
dfg qza 356 245
hjb hkg 455 24
ghf qza 12 123
dfg qza 36 55
==> foo1/AC24.txt <==
hjb hkg 457 167
ghf qza 2 165
sar sar 234 321
dfg qza 345 345
[jaypal:~/temp/test] ls foo2/
[jaypal:~/temp/test]
[jaypal:~/temp/test] awk '
BEGIN {
readpath=sprintf("%s", "./foo1")
writepath=sprintf("%s", "./foo2")
}
$0~/\*/ {
file = substr($1,2)
while ((getline var < (readpath"/"file)) > 0) {
split (var, a, " ")
ary[a[3]]=var
ary[a[4]]=var
}
}
($1 in ary) {
print ary[$1] > (writepath"/"file)
}' foo.txt
[jaypal:~/temp/test] ls foo2/
ABNA.txt AC24.txt
[jaypal:~/temp/test] head foo2/*
==> foo2/ABNA.txt <==
dfg qza 356 245
hjb hkg 455 24
dfg qza 36 55
==> foo2/AC24.txt <==
hjb hkg 457 167
sar sar 234 321
ghf qza 2 165
#!/bin/bash
mkdir -p foo2
awk '
function process_file(filename, values, filein, fileout, line, f) {
if (filename == "") return
filein = "./foo1/" filename
fileout = "./foo2/" filename
while ((getline line < filein) > 0) {
split(line, f)
if (f[3] in values || f[4] in values) {
print line > fileout
}
}
}
/^\*/ {
process_file(filename, values)
filename = substr($0, 2)
delete values
next
}
{ values[$1] }
END { process_file(filename, values) }
' ff.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.