简体   繁体   中英

Linux bash scripting for 2 folders at the same time

This is the bash executible command of mine:

while read line
do
./ngram -order 1 -lm path1/$line -ppl path2/$line -debug 4 > path3/$line
done < input_list_of_files

So, I have two folders one in path1 and the other in path2. Path 1 and Path 2 have same file names but with different extensions. For example, Path1 has many files with extension ".txt" (file1.txt) and path2 has many files with extension ".title" (file1.title).

That is, path1 has folder folder1 which has files file1.txt, file2.txt, file3.txt and so on.. Similarly, path 2 has folder folder2 which has files like, file1.title, file2.title, file3.title and so on..

The list_of_files has the data:

file1.txt
file2.txt
file3.txt

and so on...

I want to input file1.txt after the "-lm" option and input file1.title after the "-ppl" option. This works fine when I operate it for one single file at a time.

That is, when file1.txt is entered after "-lm", then at the same time, we should have file1.title after "-ppl" .

I want to do a batch computation for all the files in the folder simultaneously by inputting same file names but different extensions at the same time. How do I do it? Please help!

The example I have used:

./ngram -order 1 -lm Path1/Army_recruitment.txt -ppl Path2/Army_recruitment.title -debug 4 > Path3/Army_recruitment.txt

Output file looks like:

 military troop deployment number need
p( military | <s> )     = [1gram] 0.00426373 [ -2.37021 ]
p( troop | military ...)    = [1gram] 0.00476793 [ -2.32167 ]
p( deployment | troop ...)  = [1gram] 0.00045413 [ -3.34282 ]
p( number | deployment ...)     = [1gram] 0.0015224 [ -2.81747 ]
p( need | number ...)   = [1gram] 0.000778574 [ -3.1087 ]
p( </s> | need ...)     = [OOV] 0 [ -inf ]
1 sentences, 5 words, 0 OOVs
1 zeroprobs, logprob= -13.9609 ppl= 619.689 ppl1= 3091.84 
5 words, rank1= 0 rank5= 0 rank10= 0
6 words+sents, rank1wSent= 0 rank5wSent= 0 rank10wSent= 0 qloss=    0.998037 absloss= 0.998036

file Army_recruitment_title.txt: 1 sentences, 5 words, 0 OOVs
1 zeroprobs, logprob= -13.9609 ppl= 619.689 ppl1= 3091.84
5 words, rank1= 0 rank5= 0 rank10= 0
6 words+sents, rank1wSent= 0 rank5wSent= 0 rank10wSent= 0 qloss=   0.998037 absloss= 0.998036 

This output is generated as per the executable ./ngram. This is from a package.

# As suggested by @CharlesDuffy: use read -r to ensure that text is taken literally
while read -r line ; do
    name="${line%.txt}"     # Strip off .txt extension
    ./ngram -order 1 -lm "path1/$name.txt" -ppl "path2/$name.title" -debug 4 > "path3/$name"
done < input_list_of_files

You can use the command basename to strip path suffixes in addition to the directory name. So:

while read line
do
path2file=$(basename $line .txt).title
./ngram -order 1 -lm path1/$line -ppl path2/$path2file -debug 4 > path3/$line
done < input_list_of_files

(That assumes you still want .txt at the end of the output file)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM