I am doing a directory cleanup to check for files that are not being used in our testing environment. I have a list of all the file names which are sorted alphabetically in a text file and another file I want to compare against.
Here is how the first file is setup:
test1.pl
test2.pl
test3.pl
It is a simple, one script name per line text file of all the scripts in the directory I want to clean up based on the other file below.
The file I want to compare against is a tab file which lists a script that each server runs as a test and there are obviously many duplicates. I want to strip out the testing script names from this file and compare spit it out to another file, use uniq
and sort
so that I can diff
this file with the above to see which testing scripts are not being used.
The file is setup as such:
server: : test1.pl test2.pl test3.pl test4.sh test5.sh
There are some lines with less and some with more. My first impulse was to make a Perl
script to split the line and push the values in an list if they are not there but that seems wholly inefficient. I am not to experienced in awk
but I figured there is more than one way to do it. Any other ideas to compare these files?
这通过awk
将文件名重新排列为第二个文件中的每行一个,然后将输出与第一个文件区diff
。
diff file1 <(awk '{ for (i=3; i<=NF; i++) print $i }' file2 | sort -u)
A Perl solution that makes a %needed
hash of the files being used by the servers and then checks against the file containing all the file names.
#!/usr/bin/perl
use strict;
use warnings;
use Inline::Files;
my %needed;
while (<SERVTEST>) {
chomp;
my (undef, @files) = split /\t/;
@needed{ @files } = (1) x @files;
}
while (<TESTFILES>) {
chomp;
if (not $needed{$_}) {
print "Not needed: $_\n";
}
}
__TESTFILES__
test1.pl
test2.pl
test3.pl
test4.pl
test5.pl
__SERVTEST__
server1:: test1.pl test3.pl
server2:: test2.pl test3.pl
__END__
*** prints
C:\Old_Data\perlp>perl t7.pl
Not needed: test4.pl
Not needed: test5.pl
Quick and dirty script to do the job. If it sounds good, use open to read the files with proper error checking.
use strict;
use warnings;
my @server_lines = `cat server_file`;chomp(@server_lines);
my @test_file_lines = `cat test_file_lines`;chomp(@test_file_lines);
foreach my $server_line (@server_lines){
$server_line =~ s!server: : !!is;
my @files_to_check = split(/\s+/is, $server_line);
foreach my $file_to_check (@files_to_check){
my @found = grep { /$file_to_check/ } @test_file_lines;
if (scalar(@found)==0){
print "$file_to_check is not found in $server_line\n";
}
}
}
If I understand your need correctly you have a file with a list of tests (testfiles.txt):
test1.pl
test2.pl
test3.pl
test4.pl
test5.pl
And a file with a list of servers, with files they all test (serverlist.txt):
server1: : test1.pl test3.pl
server2: : test2.pl test3.pl
(Where I have assumed all spaces as tabs).
If you convert the second file into a list of tested files, you can then compare this using diff
to your original file.
cut -d: -f3 serverlist.txt | sed -e 's/^\t//g' | tr '\t' '\n' | sort -u > tested_files.txt
The cut
removes the server name and ':', the sed
removes the leading tab left behind, tr
then converts the remaining tabs into newlines, then we do a unique sort to sort and remove duplicates. This is output to tested_files.txt
.
Then all you do is diff testfiles.txt tested_files.txt
.
It's hard to tell since you didn't post the expected output but is this what you're looking for?
$ cat file1
test1.pl
test2.pl
test3.pl
$
$ cat file2
server: : test1.pl test2.pl test3.pl test4.sh test5.sh
$
$ gawk -v RS='[[:space:]]+' 'NR==FNR{f[$0]++;next} FNR>2 && !f[$0]' file1 file2
test4.sh
test5.sh
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.